lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.39k stars 803 forks source link

Autoencoders stop learning if in Parametric UMAP, while UMAP loss decreases #1037

Open gg4u opened 1 year ago

gg4u commented 1 year ago

Can you please understand challenges in reconstruction losses between training Autoencoders as components of Parametric UMAP, VS AE models trained indepentently and then projected ?

Below a description to help clarify the concepts:

1. In the original paper I read that the UMAP loss is computed as a cross-entropy of the probabilities distributions of points in the embedded space and input space. Then there is reconstruction loss of the points, which can be Cross-Entropy or MSE in autoencoders.

So far I understood that Parametric UMAP allows to chose a metrics for the autoencoders, not for computing the UMAP loss : Cross-Entropy will be always the metrics of UMAP loss, while MSE will only refer to the autoencoders - Is it correct?

Can you please better understand comparison of using Cross Entropy VS MSE in Parametric UMAP, and how affects learning in your experience ?

When training an Autoencoder by itself (MSE loss), it trains very fast. When training as part of the Parametric UMAP, it is very slow (same model). Parametric UMAP seems to use BinaryCross Entropy. In the original paper there is also a comparison (see below), but the comparison is beween a model trained with MSE loss, and a Parametric UMAP trained with Cross Entropy (using the same model).

Why not comparing Parametric UMAP trained with MSE ? Any reasons for doing that ? Do you strongly advice to just use default Cross Entropy and scale all inputs [0,1] ? (I have been using MSE with [-1;1])

  1. Differently form the plots in: https://github.com/lmcinnes/umap/blob/a7606f2d413c278cc98d932de62f829914209c4f/notebooks/Parametric_UMAP/04.0-parametric-umap-mnist-embedding-convnet-with-autoencoder-loss.ipynb#L211

in my tests, in Paramtric UMAP the reconstruction loss does not converge anymore (it stops learning, even if at low value), while in AE alone it keeps converging. Instead the UMAP loss keeps decreasing.

How should I intepret this difference ?

Thanks very much for helping in understanding key concepts but also best practices!!!

https://arxiv.org/pdf/2009.12981.pdf

<img width="649" alt="Screenshot 2023-08-01 at 12 38 42 PM" src="https://github.com/lmcinnes/umap/assets/284214/20184e54-12b1-42fe-85be-46816f68e751">