syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

Why there is some blank in the sythesized wav file when we use reference audio generation? #24

Closed begeekmyfriend closed 5 years ago

begeekmyfriend commented 5 years ago

Hello, thanks for your brilliant job!

I used LJSpeech dataset to train out a model and synthesized several sentences with some reference audio. But I found that there is some blank in the middle of the resulting evaluation wav file if we comment out the audio.find_endpoint method in synthesizer.py. It looked like as follows: eval-73000_ref-15_am_m-3.zip 197d0ef9-26a2-4c6d-9e3c-cc1e4c762b48 The command line is as follows:

python eval.py --checkpoint logs-tacotron/model.ckpt-73000 --reference audio ../expressive_tacotron/ref1/15_am_m.wav

In fact the reference audio is deprived from expressive_tacotron and there is no blank in the wav file.

Do you know why did it happen like this? Do we have to use reference mel during the training to improve this situation?