tanaylab / metacells

Metacells - Single-cell RNA Sequencing Analysis
MIT License
86 stars 8 forks source link

_compute_elements_similarity() might return wrong correlation values #35

Closed orenmn closed 1 year ago

orenmn commented 1 year ago

https://github.com/tanaylab/metacells/blob/master/metacells/tools/similarity.py#L212 does similarity = top_similarity + bottom_similarity. If I understand correctly, if you had 10 objects you compute the similarity for, and you specify top=6 and bottom=5, then the similarity with one object would be both in top_similarity and in bottom_similarity, such that the function will "count it twice", ultimately returning a value two times the similarity.

orenbenkiki commented 1 year ago

Fair point, there should be an assertion that top + bottom <= size. Otherwise, why bother specifying them in the 1st place? The normal case is that you have a large matrix (100s or 1000s of entries) and you keep just the top/bottom 3, or something like that.

orenbenkiki commented 1 year ago

I have this assertion in the code, would be part of the upcoming version 0.9.

orenbenkiki commented 1 year ago

Version 0.9 is now published, so closing this as done.