shivammehta25 / Neural-HMM

Neural HMMs are all you need (for high-quality attention-free TTS)
MIT License
154 stars 23 forks source link

Variance floored #8

Open Ctibor67 opened 2 years ago

Ctibor67 commented 2 years ago

When I train (in my language - czech), variance floored is sometimes displayed. But train usually continues. Is it a mistake? And how do I fix this error? (my batch size is only 1 - gtx1080 8GB, so it can't be reduced anymore). Could you not describe in HPARAMS what each line means (at least the most important code lines) ?

ghenter commented 2 years ago

How does the synthetic speech sound after training for a few thousand updates?

Variance flooring is not an error and training is expected to continue. The corresponding hyperparameter variance_floor is a lower bound/threshold on the standard deviations σ predicted by the model. The message means that the predicted standard deviation was smaller than the bound (i.e., σ < variance_floor), and thus σ was set to equal variance_floor instead. I believe this should be unrelated to batch size.

Very low σ-values can indicate a degenerate model that's overfitting to a single observation, and variance flooring is a protection against this. Such flooring is of importance in, for instance, classic decision-tree-based text-to-speech.

If you are using the default value variance_floor=0.001 and your data is normalised to global mean 0 and standard deviation 1, the warnings suggest to me that there may be pathologies in data or training. My inclination would be to check for issues with the data/processing and to try increasing variance_floor to at least 0.1. (I personally believe that the repository default value probably is too small, but we have not tried tuning it.) This should lead to a lot more warning messages about flooring, which you could comment out to get cleaner log files, but it will provide better protection against degenerate optima.

ghenter commented 2 years ago

One way to think about it is that, for high values of variance_floor (when the variance is floored all the time), training becomes equivalent to training with the mean squared error (MSE) loss function. For lower values of variance_floor, our training can give a slightly different model than training with the MSE loss would. Maximum-likelihood training, as in this repo, is in some sense theoretically more general/"smarter" than MSE, but in practice it can also go wrong sometimes, and variance flooring provides an (adjustable) level of protection against such situations.