Closed ruotianluo closed 7 years ago
It is not scientific justified approach, but just engineering hack –– when you optimize ELBO, you have not to average log p(x|z)
over the sequence length (like it's common in language modelling tasks) in order not to collapse model, thus you have to feed model with constant sized sequences filled with huge amount of padding tokens –– I had experimented with various approaches that will let me train VAE with different sized sequences e.g averaging whole ELBO with KL-Divirgence but found adding coefficient 79 most effective.
In general, I was trying to scale NLL. Thus I had grid-searched coefficients that are close to max sequence length that is equal to 83 (or something close to it, I forgot it already)
Thank you very much.
https://github.com/analvikingur/pytorch_RVAE/blob/master/model/rvae.py#L110
How do you get the coefficient 79?