Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
The loss for ldm (TTA) is calculated between the model output and pure noise here. Is this right? I think the loss should be calculated with the ground truth latent.
Describe the bug
The loss for ldm (TTA) is calculated between the model output and pure noise here. Is this right? I think the loss should be calculated with the ground truth latent.
How To Reproduce
The same steps as mentioned in the repo for TTA.
Expected behavior
Mentioned above.
Screenshots
N/A
Environment Information
The same steps as mentioned in the repo for TTA.
Additional context
N/A