jianchang512 / clone-voice

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频
https://pyvideotrans.com
Other
7.32k stars 741 forks source link

STS can not assign a specified voice #119

Open blues4347 opened 3 months ago

blues4347 commented 3 months ago

in sts tab @ any voice i uploaded selected , output is alway same one (cn-nan) while voice from the code(such as cn-nan.wav, cn-XiaoyiNeural) selected, output is the voice selected.

BobHop commented 2 months ago

Hi, same problem here. TTS works very well, but STS does not. Although the console window says it's using the selected input wav file and cloned voice wav file, the resulting audio doesn't make use of the cloned voice. [EDIT 20240801] In fact, it does make use of the cloned voice! But somehow the speech-to-speech process eliminates anything unusual about the cloned voice, making it sound very neutral. You can try it yourself: use speech-to-speech with a recording of yourself imitating an old man, a little girl, Mickey Mouse or something else -> the STS generation will work and sound different each time (a bit lower, a bit higher) but won't retain the specificities of your imitation. So I'm not sure where the problem comes from, especially when you consider that TTS generation does work and retain the specificities of the cloned voice.