Closed orenmn closed 1 year ago
Fair point, there should be an assertion that top + bottom <= size. Otherwise, why bother specifying them in the 1st place? The normal case is that you have a large matrix (100s or 1000s of entries) and you keep just the top/bottom 3, or something like that.
I have this assertion in the code, would be part of the upcoming version 0.9.
Version 0.9 is now published, so closing this as done.
https://github.com/tanaylab/metacells/blob/master/metacells/tools/similarity.py#L212 does
similarity = top_similarity + bottom_similarity
. If I understand correctly, if you had 10 objects you compute the similarity for, and you specify top=6 and bottom=5, then the similarity with one object would be both in top_similarity and in bottom_similarity, such that the function will "count it twice", ultimately returning a value two times the similarity.