Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
The output file for a voice clone produces a robotic filter. I'm guessing it's because the model wasn't trained for these types of voices? Maybe if the model was also trained on a singing dataset or maybe video game voice files from https://www.sounds-resource.com/ it won't have this problem I guess?
The output file for a voice clone produces a robotic filter. I'm guessing it's because the model wasn't trained for these types of voices? Maybe if the model was also trained on a singing dataset or maybe video game voice files from https://www.sounds-resource.com/ it won't have this problem I guess?
Here's the input file. https://vocaroo.com/1o9qI5tnG6e5
Here's the output file. https://vocaroo.com/1gvBHRus1e5r
My only workaround is to use the MaskGCT output file in Seed-VC. https://vocaroo.com/1g6DFmzj8OTP
Here are other examples: Input file. https://vocaroo.com/19JrpRyCFOQd Output file. https://vocaroo.com/1eebYSVnkU3U