uwot with distance matrix impossible to retain embedding

jlmelville / uwot

An R package implementing the UMAP dimensionality reduction method.

https://jlmelville.github.io/uwot/

GNU General Public License v3.0

321 stars 31 forks source link

uwot with distance matrix impossible to retain embedding #62

Open luciat-92 opened 4 years ago

luciat-92 commented 4 years ago

Hello James, thanks a lot for the extremely useful implementation. I am interested in using the umap function providing directly the distance matrix. I was wondering if it would be possible to extend the option ret_model = T using this kind of input or from a implementation point of view is not feasible. Thanks!

jlmelville commented 4 years ago

ret_model isn’t supported with a distance matrix because it’s primarily intended for transforming new data and you need the actual original input data to find the distances between them and the new data to be transformed.

There is a use case where I think it would be feasible (but not implemented currently) that you would be able to use the model to transform new data if you also provided a distance matrix between the original data and the new points. If that’s the case, it seems unusual to have full distance matrices available but not the underlying data: I’d be curious to know the domain the data comes from if you can say.

luciat-92 commented 4 years ago

I am training a machine learning model on a fixed data and space using UMAP. In order to compute the UMAP embedding, I am using distances and not the original data as a non linear combination of different input formats. Then I want to add new data points as test set without recomputing the embedding space since that has to remain fixed for the machine learning to be applicable on external data. For the test set I would have the information of the distances of each point in the test with respect to the train. For this reason, that implementation would be really useful for my case. Do you think it would be possible?

jlmelville commented 4 years ago

Ok, that sounds like it would be possible but I can't say when (or if) it will get done.

jlmelville commented 4 years ago

@luciat-92 does #64 cover your use case?