Question about the VAE's KL-loss

hardmaru / WorldModelsExperiments

World Models Experiments

608 stars 171 forks source link

Question about the VAE's KL-loss #8

Closed kaiolae closed 6 years ago

kaiolae commented 6 years ago

Hi! I'm trying to reproduce the doom example in Keras, and was curious about the KL-loss calculation of the VAE, specifically the parameter kl_tolerance. As far as I understand it limits the KL-loss from ever going under 32. What is the purpose of this? What effect would it have to remove this tolerance? Thanks, and thanks for a very well written paper! -Kai

hardmaru commented 6 years ago

Hi @kaiolae

Thanks for the comment. Someone else has asked me this before (see discussion)

Basically, I stop optimizing for KL loss term once it is lower than some level, rather than letting it go to near zero. So optimize for tf.max(KL, good_enough_kl_level) instead, to relax the information bottleneck of the VAE.

This method was inspired by “free bits” concept in the appendix section of this paper: https://arxiv.org/abs/1606.04934 and I also did this in the sketch-rnn paper.