Closed zelenooki87 closed 3 weeks ago
This might be due to generating too much dialogue. Long-form is planned: https://github.com/open-mmlab/Amphion/issues/290
Strange thing, but with this repo: https://github.com/justinjohn0306/MaskGCT-Windows everything working super great.
I set up the conda environment according to the instructions, but when I upload any English audio file and type in any English text, I get strange output files. They don't sound anything like your test examples or the few YouTube videos I've seen. I also can't get the demo on Hugging Face Spaces to fully launch so I can troubleshoot the problem. I've attached screenshots and the input/output files. It's really incoherent, isn't it?
input-2mpvoice.zip output_100timestep.zip