Closed v-nhandt21 closed 4 years ago
I am not sure of the meaning of the sentence "Are your models really output a speech or not?" The output of my FastSpeech2 implementation is a mel-spectrogram, which can be conversed to wavfiles by vocoders such as WaveGlow and MelGAN.
If you are asking that why there is a large gap between the training and validation mel_loss and mel_postnet_loss curves, it is because that in evaluate.py
the model synthesizes mel-spectrograms without ground-truth F0 and energy labels.
closed #4
When I train model FastPitch from NVIDIA source code, I have the same images like yours, My train data is 11239 and validation is 1000, I have seen that the train line and val line more and more separate it others. It seems not common Are your models really output a speech or not? I feel so confused? Thank you for sharing the code <3