Open blues4347 opened 4 months ago
Hi, same problem here. TTS works very well, but STS does not. Although the console window says it's using the selected input wav file and cloned voice wav file, the resulting audio doesn't make use of the cloned voice. [EDIT 20240801] In fact, it does make use of the cloned voice! But somehow the speech-to-speech process eliminates anything unusual about the cloned voice, making it sound very neutral. You can try it yourself: use speech-to-speech with a recording of yourself imitating an old man, a little girl, Mickey Mouse or something else -> the STS generation will work and sound different each time (a bit lower, a bit higher) but won't retain the specificities of your imitation. So I'm not sure where the problem comes from, especially when you consider that TTS generation does work and retain the specificities of the cloned voice.
in sts tab @ any voice i uploaded selected , output is alway same one (cn-nan) while voice from the code(such as cn-nan.wav, cn-XiaoyiNeural) selected, output is the voice selected.