Comparing similarity of two UMAPs?

chris-rands commented 3 years ago

Apologies if this it out of scope here, but given two or more UMAPs, visually we can compare them manually by eye when plotting UMAP1 vs. UMAP2 with labels and get an intuition of how similar they are in terms of local/global structure. Is there any straightforward way to computationally compare UMAPs and calculate some kind of distance metric?

lmcinnes commented 3 years ago

Assuming you have anchor points (i.e. points that are in both UMAPs) then the simplest thing to do is measure the procrustes distance between them. That means you need to do a procrustes alignment (really just some linear algebra) and then total up the pairwise euclidean distances between the anchor points.

chris-rands commented 3 years ago

Thank you- sometimes the UMAP visualised is rotated but fundamentally the same structure, would that be handled correctly by this approach? I am a computational biologist and not an expert on the maths I'm afraid

lmcinnes commented 3 years ago

Yes, to some degree. Procrustes alignment is essentially just finding an affine transformation (combination of translation, uniform scaling, rotation and reflection) that best aligns the anchor points. Wikipedia has a (probably too detailed) summary: https://en.wikipedia.org/wiki/Procrustes_analysis . Really the scale and centering are easy enough to take care of (translation to have the same center of mass; scale to have the same average distance from the center of mass) and it then boils down to an orthogonal procrustes problem: https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem . That can be solved by straightforward linear algebra.

So, in summary: translate the embeddings such that the center of masses are the same (i.e. zero). Scale the embeddings such that the average norm for each embedding is the same. Find the rotation that will best match up the embeddings -- an orthogonal procrustes problem, solved via an SVD. I have code for this, but I do not have access to it at the moment unfortunately.

lmcinnes / umap

Comparing similarity of two UMAPs? #607