rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] Alternative output metrics for UMAP and TSNE. #1799

Open vals opened 4 years ago

vals commented 4 years ago

A recent new feature in the UMAP package is the ability to use alternative metrics for the embedding space: https://umap-learn.readthedocs.io/en/latest/embedding_space.html

This has the effect of embedding the data on a different manifold than the Euclidean plane as is commonly done. This way e.g. more interpretable embeddings can be made. I find the embedding using the Haversine metric particularly interesting since it allows you to make more use of the plotting window with less white space when converting the coordinates to latitudes and longitudes.

I believe the same concept would work for TSNE as well. When calculating the q_ij's in the embedding space some other norm could be used instead of Euclidean norm.

Describe the solution you'd like

It would be great if a reasonable number of metrics for 1- and 2-dimensional manifolds could be implemented and a user could pick an output metric when calling UMAP or TSNE.

cjnolet commented 4 years ago

Looking through more recent umap changes, i don’t believe this should be too hard to implement, as it seems to have the most effect on the SGD stage.

I believe a first pass for this could support hard-coded distance metrics and future iterations could use something like NVRTC to compile custom functions in real-time and pass them to the optimizer.

I suppose Haversine metric would even be a good starting point for a proof-of-concept.

Thank you, @vals, for bringing this feature to our attention!

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.