yilundu / improved_contrastive_divergence

[ICML'21] Improved Contrastive Divergence Training of Energy Based Models
62 stars 14 forks source link

Choice of $\Pi$ in CD #10

Closed GloryyrolG closed 2 years ago

GloryyrolG commented 2 years ago

Hi Yilun @yilundu ,

As stated in Sec. 2 of the paper, in the second term of CD, $KL(\Pi_{\theta}^t (pD(x)) | p{\theta}(x))$, $\Pi$ is not necessary to be a MCMC transition kernel. E.g., it can be an amortized generator parameterized by other params. The appendix shows an example of an MCMC alternative initialized with random. May I consult is this more empirical practice or supported by some math proofs? Correct me if I were wrong.

Thanks.

yilundu commented 2 years ago

Hi,

The appendix show an empirical result that our loss also improves stability when samples are initialized from random. We unfortunately don't have theoretical results showing this benefit.