Open bojone opened 5 years ago
In a word, I am really interested in your model but I want to make the whole derivation more naturally.
I would not like for you to think about GANs (wgan-gp, wgan) in the first place. The objective of this paper is to not find a new GAN. It's to start from the theory of maximum-likelihood energy-based models and find a better way of learning / training energy-based models.
The connection to GANs is just an interesting coincidence.
compared with wgan-gp or wgan-div, your new GAN has an additional I(X,Z) term on generator loss. This term may prevent generator from mode collapse.
As we known, wgan-gp or wgan-div can be trained successfully without I(X,Z). But as your derivation in your paper, I(X,Z) is indispensable.
Therefore, how can we understand the success of wgan-gp or wgan-div under your framework ?