As stated in Sec. 2 of the paper, in the second term of CD, $KL(\Pi_{\theta}^t (pD(x)) | p{\theta}(x))$, $\Pi$ is not necessary to be a MCMC transition kernel. E.g., it can be an amortized generator parameterized by other params. The appendix shows an example of an MCMC alternative initialized with random. May I consult is this more empirical practice or supported by some math proofs? Correct me if I were wrong.
The appendix show an empirical result that our loss also improves stability when samples are initialized from random. We unfortunately don't have theoretical results showing this benefit.
Hi Yilun @yilundu ,
As stated in Sec. 2 of the paper, in the second term of CD, $KL(\Pi_{\theta}^t (pD(x)) | p{\theta}(x))$, $\Pi$ is not necessary to be a MCMC transition kernel. E.g., it can be an amortized generator parameterized by other params. The appendix shows an example of an MCMC alternative initialized with random. May I consult is this more empirical practice or supported by some math proofs? Correct me if I were wrong.
Thanks.