[Help]: Performance Inferior to Demo Showcase in terms of "FACodec: Voice Conversion Samples" - Githubissues

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

https://openhlt.github.io/amphion/

MIT License

4.28k stars 363 forks source link

[Help]: Performance Inferior to Demo Showcase in terms of "FACodec: Voice Conversion Samples" #203

Open zyy-fc opened 2 months ago

zyy-fc commented 2 months ago

Problem Overview

I download the audios from the "FACodec: Voice Conversion Samples" and I use the python script shown in FACodec-README with pretrained model "FACodecEncoderV2/FACodecDecoderV2", but the voice conversion is not as good as the demo showcase or the results in https://huggingface.co/spaces/amphion/naturalspeech3_facodec

audio files is here: audio files "1_female_recon.wav" is the voice conversion audio by myself, "1_female_recon_huggingface.wav" is from https://huggingface.co/spaces/amphion/naturalspeech3_facodec

zyy-fc commented 2 months ago

Another question: the difference between FACodecDecoderV2 and FACodecRedecoder is ?