haoheliu / AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.
https://audioldm.github.io/audioldm2/
MIT License
174 stars 34 forks source link

About configuration of released code and model #6

Closed VanderHua closed 7 months ago

VanderHua commented 8 months ago

Thank you very much for this great work.

I found that the latent shape printed when I loaded AE with the given config was inconsistent with the sample shape printed when infer with generate_sample() func. What could be causing this? When init AE: Working with z of shape (1, 8, 64, 64) = 32768 dimensions. When calling generate_sample() func: Data shape for DDIM sampling is (3, 8, 256, 16), eta 1.0

And I found that the generated sample I got through the generate_sample() func is much different from the domo effect on hugging face. Is calling the generate_sample() func directly not the final infer process? What other configurations are needed?

Looking forward to it and thank you for your kind reply!

haoheliu commented 8 months ago

@VanderHua Oh sorry that's actually a typo on the printing. It should not affect the system behavior at all. I'll update it on main branch shortly. Thanks for pointing that out