haoheliu / AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.
https://audioldm.github.io/
Other
2.45k stars 222 forks source link

The vae decoder cannot recover original audio with the extracted latent code #96

Open ikm565 opened 1 year ago

ikm565 commented 1 year ago

Hi! Thank u for making this amazing project public! I just want a guidance for a problem I meet when tuning this code. Why the vae decoder cannot recover the original speech wav when I directly use the latent code extracted by the provided encoder as the input? Can u guys make an analysis? Thank u very much!

smile-struggler commented 1 year ago

I’ve come across the same issue as well. May I ask if you’ve found a solution?

smile-struggler commented 1 year ago

I’ve figured it out! You need to transpose the first and second dimensions of the Mel spectrum, then there’s no problem.