Open ghost opened 3 years ago
If you are training a multi-speaker model, it would be a good idea to check whether overfitting is occurring for each speaker. It can be identified by the trend of the spectrogram error in training and evaluation. We chose 2.5M steps for comparison with other models. Depending on datasets, appropriate training steps may be different.
I noticed the quality for some speakers starts to get worse and there start to be some breaking points in the generated waves, but for some other speakers it gets better and better. Is it normal to observe such a thing? Why in the paper says the model was trained for 2.5M steps?