open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.81k stars 590 forks source link

[BUG]: Robotic filter for a female voice clone in MaskGCT #353

Open GUUser91 opened 1 week ago

GUUser91 commented 1 week ago

The output file for a voice clone produces a robotic filter. I'm guessing it's because the model wasn't trained for these types of voices? Maybe if the model was also trained on a singing dataset or maybe video game voice files from https://www.sounds-resource.com/ it won't have this problem I guess?

Here's the input file. https://vocaroo.com/1o9qI5tnG6e5

Here's the output file. https://vocaroo.com/1gvBHRus1e5r

My only workaround is to use the MaskGCT output file in Seed-VC. https://vocaroo.com/1g6DFmzj8OTP

Here are other examples: Input file. https://vocaroo.com/19JrpRyCFOQd Output file. https://vocaroo.com/1eebYSVnkU3U