Open vals opened 4 years ago
Looking through more recent umap changes, i don’t believe this should be too hard to implement, as it seems to have the most effect on the SGD stage.
I believe a first pass for this could support hard-coded distance metrics and future iterations could use something like NVRTC to compile custom functions in real-time and pass them to the optimizer.
I suppose Haversine metric would even be a good starting point for a proof-of-concept.
Thank you, @vals, for bringing this feature to our attention!
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
A recent new feature in the UMAP package is the ability to use alternative metrics for the embedding space: https://umap-learn.readthedocs.io/en/latest/embedding_space.html
This has the effect of embedding the data on a different manifold than the Euclidean plane as is commonly done. This way e.g. more interpretable embeddings can be made. I find the embedding using the Haversine metric particularly interesting since it allows you to make more use of the plotting window with less white space when converting the coordinates to latitudes and longitudes.
I believe the same concept would work for TSNE as well. When calculating the q_ij's in the embedding space some other norm could be used instead of Euclidean norm.
Describe the solution you'd like
It would be great if a reasonable number of metrics for 1- and 2-dimensional manifolds could be implemented and a user could pick an output metric when calling UMAP or TSNE.