howardyclo / papernotes

My personal notes and surveys on DL, CV and NLP papers.
128 stars 6 forks source link

Understanding disentangling in β-VAE #33

Open howardyclo opened 6 years ago

howardyclo commented 6 years ago

Metadata

Useful Tutorials of VAE and β-VAE

Background

Motivation

Understanding disentangling in β-VAE

Intuition of Improvement (The most important part)

Reference

Further Readings

howardyclo commented 5 years ago

How to Tune Hyperparameters Gamma and C? (Response by Christopher P. Burgess)

Gamma sets the strength of the penalty for deviating from the target KL, C. Here you want to tune this such that the (batch) average KL stays close to C (say within < 1 nat) across the range of C that you use. This exact value doesn't usually matter much, but just avoid it being too high such that it destabilises the optimisation. C itself should start from low (e.g. 0 or 1) and gradually increase to a value high enough such that reconstructions end up good quality. A good way to estimate Cmax is to train B-VAE on your dataset with a beta low enough such that reconstructions end up good quality and look at the trained model's average KL. That KL can be your Cmax because it gives you a rough guide as to the average amount of representational capacity needed to encode your dataset.