I used LJSpeech dataset to train out a model and synthesized several sentences with some reference audio. But I found that there is some blank in the middle of the resulting evaluation wav file if we comment out the audio.find_endpoint method in synthesizer.py. It looked like as follows:
eval-73000_ref-15_am_m-3.zip
The command line is as follows:
Hello, thanks for your brilliant job!
I used
LJSpeech
dataset to train out a model and synthesized several sentences with some reference audio. But I found that there is some blank in the middle of the resulting evaluation wav file if we comment out theaudio.find_endpoint
method insynthesizer.py
. It looked like as follows: eval-73000_ref-15_am_m-3.zip The command line is as follows:In fact the reference audio is deprived from expressive_tacotron and there is no blank in the wav file.
Do you know why did it happen like this? Do we have to use reference mel during the training to improve this situation?