sq.gr.co_occurrence gives different results for n_splits=1 and n_splits>1

scverse / squidpy

Spatial Single Cell Analysis in Python

https://squidpy.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

401 stars 72 forks source link

sq.gr.co_occurrence gives different results for n_splits=1 and n_splits>1 #755

Open JuergenLippoldt opened 9 months ago

JuergenLippoldt commented 9 months ago

Description

Calculating co-occurrences I noticed some inf and nan values in the outcome and while looking into it I saw Issue #689 and tried to reduce n_splits to 1. Not only did the inf and nan values go away, but all the other values changed significantly.

I would very much appreciate a fix because the function n_splits=1 produces an error for very large samples. Thank you :)

Version

1.3.0

giovp commented 9 months ago

yes, it's expected, that is because of the way the co-occurrence is implemented, we split the data in chunks if it is too large. I would suggest to split the data by slide and run the co occurrence in each slide separately maybe that helps. It would be cool to have faster implementations but I don't have time to look at this now