keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.96k stars 957 forks source link

Trimming silences #212

Open yoosif0 opened 6 years ago

yoosif0 commented 6 years ago

I am getting good results with begeekmyfriend's fork but I am looking for further improvements.

Some audio files have silences at the end. Would that affect the accuracy negatively? Do you think I should trim those parts manually from the wav files?

I think I might not need to trim those silences because for example as you could see from this alignment graph that the model understood by itself the silence part. What do you think? image


Update I found out that this alignment graph is not for the training example but for a generated example so this question might be irrelevant now

begeekmyfriend commented 6 years ago

You may adjust audio.find_endpoint for your requirements.

yoosif0 commented 6 years ago

Thank you @begeekmyfriend. I updated my post to be clearer. There is no problem with synthesizing. I am just asking if removing the silences in the preprocessing part would lead to an improvement.

begeekmyfriend commented 6 years ago

You may add librosa.effect.trim method in _process_utterance