shivammehta25 / OverFlow

Putting flows on top of neural transducers for better TTS
MIT License
63 stars 10 forks source link

on chinese data #14

Closed ruyijidan closed 1 year ago

ruyijidan commented 1 year ago

thank you for your gread work! I use pypingyin instead of 'CMU'. I train on genshin paimon data,but the result is not good. I do not know that's why?

here is the result synth_zh1.wav.zip

this is pre-trained model 'OverFlow-Female.ckpt' result (not very good..) synth_en1.wav.zip

shivammehta25 commented 1 year ago

Hello, thank you for your interest in the work.

In the zip while you shared, something seems a bit off with the audio preprocessing, thus having this type of screeching noise in the background.

Further, I feel there are many phonetisation errors in the pre-trained model's synthesis. Could you try synthesising it with a better front end? Like the one in coqui perhaps (code below)? I will be switching the front-end to https://pypi.org/project/phonemizer/ in the future, and CMUDict is only meant to be for reproducibility purposes.

# Install TTS
pip install tts
# Change --text to the desired text prompt
# Change --out_path to the desired output path
tts --text "Hello world!" --model_name tts_models/en/ljspeech/overflow --vocoder_name vocoder_models/en/ljspeech/hifigan_v2 --out_path output.wav
ruyijidan commented 1 year ago

Thank you for your reply.

my data is not have background noise.. so the result is very weird. I will try coqui. It's convenient to use phonemizer. Thanks again.