Open yoosif0 opened 6 years ago
You may adjust audio.find_endpoint for your requirements.
Thank you @begeekmyfriend. I updated my post to be clearer. There is no problem with synthesizing. I am just asking if removing the silences in the preprocessing part would lead to an improvement.
You may add librosa.effect.trim method in _process_utterance
I am getting good results with begeekmyfriend's fork but I am looking for further improvements.
Some audio files have silences at the end. Would that affect the accuracy negatively? Do you think I should trim those parts manually from the wav files?
I think I might not need to trim those silences because for example as you could see from this alignment graph that the model understood by itself the silence part. What do you think?
Update I found out that this alignment graph is not for the training example but for a generated example so this question might be irrelevant now