Open luciat-92 opened 4 years ago
ret_model
isn’t supported with a distance matrix because it’s primarily intended for transforming new data and you need the actual original input data to find the distances between them and the new data to be transformed.
There is a use case where I think it would be feasible (but not implemented currently) that you would be able to use the model to transform new data if you also provided a distance matrix between the original data and the new points. If that’s the case, it seems unusual to have full distance matrices available but not the underlying data: I’d be curious to know the domain the data comes from if you can say.
I am training a machine learning model on a fixed data and space using UMAP. In order to compute the UMAP embedding, I am using distances and not the original data as a non linear combination of different input formats. Then I want to add new data points as test set without recomputing the embedding space since that has to remain fixed for the machine learning to be applicable on external data. For the test set I would have the information of the distances of each point in the test with respect to the train. For this reason, that implementation would be really useful for my case. Do you think it would be possible?
Ok, that sounds like it would be possible but I can't say when (or if) it will get done.
@luciat-92 does #64 cover your use case?
Hello James, thanks a lot for the extremely useful implementation. I am interested in using the umap function providing directly the distance matrix. I was wondering if it would be possible to extend the option ret_model = T using this kind of input or from a implementation point of view is not feasible. Thanks!