open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
4.21k stars 360 forks source link

[Help]: The difference between the FAcodec pretrained model "FACodecEncoderV2" vs "FACodecEncoder" #192

Open zyy-fc opened 2 months ago

zyy-fc commented 2 months ago

Can you explain the difference between the FAcodec pretrained model "FACodecEncoderV2" vs "FACodecEncoder" ?

Why using "FACodecEncoderV2" to do zero-shot TTS?

Are these two difference from the training strategy or datasets ?

HeCheng0625 commented 2 months ago

Hi, there is no difference between FACodecEncoder and FACodecEncoderV2, the difference between FACodecDecoder and FACodecDecoderV2 is that the prosody part of FACodecDecoderV2 using pitch shift wavform to achieve better disentanglement with timbre.