minyoungg / platonic-rep

439 stars 26 forks source link

mutual_knn: triangle inequality? #4

Closed dribnet closed 2 months ago

dribnet commented 3 months ago

Was playing with the mutual_knn metric trying to build some intuitions. I'm curious whether given three representation spaces A, B and C, if we know the following for a fixed k:

mutual_knn(A, B) = x mutual_knn(B, C) = y

is it possible that we can already know constraints on the mutual_knn(A, C)? Some sort of rule like:

mutual_knn(A, C) >= x * y

As a concrete example - if we know that (A, B) are very highly aligned at 0.9 And (B, C) and very highly aligned at 0.9 Could we conclude that mutual_knn(A, C) is at least 0.81?

My intuition is no 😅. But I thought I would ask out loud anyway since it could be useful to understand what useful algebraic structures these spaces my have.

briancheung commented 2 months ago

Good question!

You can think of mutual_knn(A, B) like the normalized intersection of the nearest neighbor set of A and the nearest neighbor set of C.

Example for k = 100:

If 90 out of 100 of the nearest neighbors overlap between A and B (score of 0.9), there are 10 elements that are in the disjoint set of A and B ( |B-A| = 10 ). More specially, there are 10 elements in the nearest neighbors of B that are not shared with A.

So if B and C also have 90 out of 100 overlap, at worst case 10 of those 90 would be from disjoint of A and B. So the mutual_knn(A, C) would be at least 0.80. So it looks like you can definitely make structures based on the properties of sets.

ssnl commented 2 months ago

The "transitive-like" property is $m{AC} \geq m{AB} + m_{AC} - 1$. In other words, you want the overlaps (AB and AC) to each be so large that these overlaps have to overlap. It has a relationship, but is not as strong as what you suggested.

dribnet commented 2 months ago

Thanks for that feedback! I better understand the limitations of this approach and will close this issue.