rdevon / DIM

Deep InfoMax (DIM), or "Learning Deep Representations by Mutual Information Estimation and Maximization"
BSD 3-Clause "New" or "Revised" License
799 stars 102 forks source link

MINE as an estimation of Mutual Information between input space and latent representation #40

Open sachahai opened 4 years ago

sachahai commented 4 years ago

Thank you very much for your interesting work and releasing the code, much appreciated !!

I am implementing several manifold learning methods (from 64x64 images to 3D) that includes a jointly optimization of mutual information (MI) with MINE tricks (DIM, InfoMax VAE, ...) As in those methods we are interested in maximizing MI (and not get its precise value), I understand the use of a more stable (but less tight) lower bound of MI, as Jensen-Shannon Divergence or InfoNCE.

However, as you did also in you DIM research paper (?), I want now to use Mutual information between input space and latent representation as a quantitative metrics to evaluate the latent code, and be able to compare for instance with state of the art technics (UMAP, t-SNE).

As we want a precise estimation of MI, do you agree that :

I tried several implementation of this, have an overall coherent MI behavior, but very unstable (no clear asymptote at all), it would be difficult to extract a single MI estimation from the output. Therefore I failed to use MINE as a metrics to compare different dimensionality reduction technics. It would be so helpful if you could share your implementation of MINE for that purpose, or just some insight on the architecture you used, the optimizer, the lower bound on MI you used.

Any advice is welcome. Thank you so much in advance !

rdevon commented 4 years ago

I would check out Ben Poole's paper, who did a little more analysis as far as the estimating part: http://proceedings.mlr.press/v97/poole19a.html