Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
In file audioldm_inference.py, when we get text embedding, why we need to concat with uncond, what does it mean and do we really need it. Thank you!