open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.45k stars 379 forks source link

[BUG]: Try to reproduce emilia+amphion, unable to find emilia_dataset dataloader script #244

Closed wjddd closed 1 month ago

wjddd commented 1 month ago

I'm trying to reproduce emilia+amphion in https://mp.weixin.qq.com/s/NDhBe-INw5oTew3ruQ6YSQ and I found this line in valle_ar_trainer.py: https://github.com/open-mmlab/Amphion/blob/72112a678d90873d8312e8cffd2491ffcdd6b40e/models/tts/valle_v2/valle_ar_trainer.py#L208 But no such file (emilia_dataset.py) in models/tts/valle_v2, only libritts_dataset.py. Could you provide some detailed instruction on how to reproduce emilia+amphion?

jiaqili3 commented 1 month ago

Hi @wjddd, For the 'Emilia + Amphion' model you mentioned, the result is not produced by the exact model you mentioned. The valle_v2 model we previously released was an English model and it doesn't support Emilia, because it's released before the dataset. The 'Amphion+Emilia' model release would be a future plan. Thanks!

wjddd commented 1 month ago

Hi @wjddd, For the 'Emilia + Amphion' model you mentioned, the result is not produced by the exact model you mentioned. The valle_v2 model we previously released was an English model and it doesn't support Emilia, because it's released before the dataset. The 'Amphion+Emilia' model release would be a future plan. Thanks!

@jiaqili3 Thank you so much! BTW, does it mean the 'Amphion+Emilia' model is a brand new model apart from VALLE/NS2/NS3/FS2/VITS/Jets?

jiaqili3 commented 1 month ago

Hi @wjddd, For the 'Emilia + Amphion' model you mentioned, the result is not produced by the exact model you mentioned. The valle_v2 model we previously released was an English model and it doesn't support Emilia, because it's released before the dataset. The 'Amphion+Emilia' model release would be a future plan. Thanks!

@jiaqili3 Thank you so much! BTW, does it mean the 'Amphion+Emilia' model is a brand new model apart from VALLE/NS2/NS3/FS2/VITS/Jets?

Hi @wjddd, though I didn't work on training the new 'amphion+emilia' model, what I know is that the model architecture integrates most recent advances in TTS, there are papers like seedtts, e3tts, soundstorm, etc. Thanks!