open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.75k stars 589 forks source link

[Help]: Gradio demo isn't working correctly on either Windows or Ubuntu. I'm experiencing the same issue on both operating systems. #327

Closed zelenooki87 closed 3 weeks ago

zelenooki87 commented 3 weeks ago

I set up the conda environment according to the instructions, but when I upload any English audio file and type in any English text, I get strange output files. They don't sound anything like your test examples or the few YouTube videos I've seen. I also can't get the demo on Hugging Face Spaces to fully launch so I can troubleshoot the problem. I've attached screenshots and the input/output files. It's really incoherent, isn't it?

SNAG-0001

input-2mpvoice.zip output_100timestep.zip

Tybost commented 3 weeks ago

This might be due to generating too much dialogue. Long-form is planned: https://github.com/open-mmlab/Amphion/issues/290

zelenooki87 commented 3 weeks ago

Strange thing, but with this repo: https://github.com/justinjohn0306/MaskGCT-Windows everything working super great.