Closed ruyijidan closed 1 year ago
Hello, thank you for your interest in the work.
In the zip while you shared, something seems a bit off with the audio preprocessing, thus having this type of screeching noise in the background.
Further, I feel there are many phonetisation errors in the pre-trained model's synthesis. Could you try synthesising it with a better front end? Like the one in coqui perhaps (code below)? I will be switching the front-end to https://pypi.org/project/phonemizer/ in the future, and CMUDict is only meant to be for reproducibility purposes.
# Install TTS
pip install tts
# Change --text to the desired text prompt
# Change --out_path to the desired output path
tts --text "Hello world!" --model_name tts_models/en/ljspeech/overflow --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path output.wav
Thank you for your reply.
my data is not have background noise.. so the result is very weird. I will try coqui. It's convenient to use phonemizer. Thanks again.
thank you for your gread work! I use pypingyin instead of 'CMU'. I train on genshin paimon data,but the result is not good. I do not know that's why?
here is the result synth_zh1.wav.zip
this is pre-trained model 'OverFlow-Female.ckpt' result (not very good..) synth_en1.wav.zip