About configuration of released code and model

Thank you very much for this great work.

I found that the latent shape printed when I loaded AE with the given config was inconsistent with the sample shape printed when infer with generate_sample() func. What could be causing this? When init AE: Working with z of shape (1, 8, 64, 64) = 32768 dimensions. When calling generate_sample() func: Data shape for DDIM sampling is (3, 8, 256, 16), eta 1.0

And I found that the generated sample I got through the generate_sample() func is much different from the domo effect on hugging face. Is calling the generate_sample() func directly not the final infer process? What other configurations are needed?

Looking forward to it and thank you for your kind reply!

haoheliu / AudioLDM-training-finetuning

About configuration of released code and model #6