rdevon / DIM

Deep InfoMax (DIM), or "Learning Deep Representations by Mutual Information Estimation and Maximization"
BSD 3-Clause "New" or "Revised" License
799 stars 102 forks source link

A problem in your paper #5

Closed Abraham0912541 closed 5 years ago

Abraham0912541 commented 5 years ago

Hi, Rdevon, I read your paper DIM, a good job. However, I have a question about this model. In your paper, you use the JS divergence to replace the KL divergence, but this maybe get a bad result: the marginal probability product p(z)p(x) will be greater than the p(z|x)p(x), this will result in a poor result, the probability that a specific representation given a specific sample will be reduced. This is my question and what do you think it.

rdevon commented 5 years ago

We're still studying the differences between the KL and JSD when used to maximize the expected log ratio of the joint and marginal probability measures. We have some more recent analysis and experiments that show that the JSD and KL between these measures are approximately monotonic, and the highest JSDs also have the highest KLs. This will be added to the arXiv version soon.

There is also new work I'm aware of (that should be available on arXiv soon, not by us) that show that when the JSD is used in this way to train an estimator for the MI (the output of the JSD network can also be used to approximate the expected log ratio), that the JSD-based estimator is very accurate relative to other estimators (Fenchel-based KL or NCE-based).

But you are right that the JSD and KL-based estimators do not have gradients that point in the same direction for all realizations of the joint and marginal. Exactly why this matters isn't entirely clear to me yet, but we are comfortable now believing that they can be used to maximize / estimate the same thing.

Hope that helps.

Abraham0912541 commented 5 years ago

Thank you for your reply,I think that I have understood it. Thank you again.

HobbitLong commented 5 years ago

Hi Devon,

Would you mind sharing the title of the paper that you mentioned here? It has been two months ago, so I guess this paper is on arXiv now. But I did not find it by searching for some keywords.