umap for supervised (metric) learning

lmcinnes / umap

Uniform Manifold Approximation and Projection

BSD 3-Clause "New" or "Revised" License

7.45k stars 807 forks source link

umap for supervised (metric) learning #415

Open icarmi opened 4 years ago

icarmi commented 4 years ago

Hi,

I read through the tutorial for metric learning: https://umap-learn.readthedocs.io/en/latest/supervised.html

I have two questions (that may be related, I'm not sure)

Is there an explanation somewhere of how umap uses the labels with the distance function to perform the supervised training?
Are there any hyper-parameters that are more useful in the metric learning (as opposed to unsupervised) that can help avoid over-fitting?

Thank you!

lmcinnes commented 4 years ago

Essentially umap takes the labels as a separate metric space (with a categorical distance on it), and tries to fold the two data and labels together by performing an intersection of the simplicial sets.

There are some hyper-parameters. The main one would be target_weight which provides some level of balance between how much weight is applied to the label vs data. A target_weight of 1.0 will put almost all the weight on the labels, while a target_weight of 0.0 will weight as much as can be managed in favour of the data.

icarmi commented 4 years ago

Awesome, thank you!!! :)

buhrmann commented 4 years ago

Hi @lmcinnes, just wondering if you could also explain how the transform() part of metric learning works, if it's not already mentioned somewhere else. I understand (intuitively) how labels are used during training (the intersection of separate graphs bit). But how does that then affect new data which doesn't have labels? I imagine new data points are embedded somehow based on their similarity to points in the trained graph (intersected sets)?

lmcinnes commented 4 years ago

@buhrmann you are essentially correct; it uses the learned graph of the data space (rather than the intersected graph) since we don't have labels for the new points. The assumption is that this structure is sufficient. It, of course, is not always the case, but it has been effective for several use cases.