lmcinnes / umap

Uniform Manifold Approximation and Projection
BSD 3-Clause "New" or "Revised" License
7.45k stars 808 forks source link

Passing distance matrix in metric="precomputed", what about the data?! #428

Open NimaSarajpoor opened 4 years ago

NimaSarajpoor commented 4 years ago

Hello,

I read about issues about passing customized distance function to the UMAP. I realized that we can easily feed the UMAP with the distance matrix instead (#348 ). Right? So, for instance: my_model = umap.UMAP(metric='precomputed') my_model_fit = my_model . fit_transform(distance_matrix)

But, what about the data itself? Because two data sets may have the same distance matrix but different locations in the space.

In other words, if two distance matrices are exactly equal to each other, does that mean the overall shape of the data points is the same in the original space? and that's what matters for UMAP?

(NOTE: I should also mention that in the explanation of the argument "metric: string or function (optional, default ‘euclidean’) in the document, I couldn't find the "precomputed" as an eligible input for the metric.)

"

Thanks, Nima

lmcinnes commented 4 years ago

I think that is the right view. UMAP is based on the metric and topological structure of the data. It's position or configuration in space doesn't really matter; it is the inter-relationships among the data samples that matter.

jackcan2 commented 1 year ago

@lmcinnes - I have been working with time series data in my projects and pass a precomputed DTW matrix in this way. I know that DTW is not technically a "metric" due to the triangle inequality [1]. Does this property mean that DTW is not appropriate to use with UMAP? The embeddings I've produced look fairly reasonable out of the box, with no parameter tuning. However, I just wanted to make sure I'm not going down the wrong path here.

It looks like your response above suggests that UMAP only cares about how similar the data points are based on whatever measure is provided (doesn't need to be a metric in the strict sense).

Reference:

[1] https://math.stackexchange.com/questions/3381564/how-to-design-a-metric-from-dtw