Why not take scipy implementation of Pearson correlation for computing lineage drivers?

theislab / cellrank

CellRank: dynamics from multi-view single-cell data

https://cellrank.org

BSD 3-Clause "New" or "Revised" License

345 stars 46 forks source link

Why not take scipy implementation of Pearson correlation for computing lineage drivers? #1231

Open VladimirShitov opened 3 days ago

VladimirShitov commented 3 days ago

Hi! Thanks a lot for your tool, I find it very useful and easy to use. I noticed that in the code for computing correlation, both Fisher's method and permutation test approaches are implemented manually. Is there any reason not to use scipy implementation? As far as I understood from their documentation, it uses the same transformation for Fisher's test, also allows permutation test, and is many times tested by the community.

WeilerP commented 3 days ago

Correct me if I'm wrong, @michalk8, but it's for performance reasons, right? Last time I checked, Scipy doesn't cope well with computing many pairwise correlations, etc.

michalk8 commented 3 days ago

Correct me if I'm wrong, @michalk8, but it's for performance reasons, right? Last time I checked, Scipy doesn't cope well with computing many pairwise correlations, etc.

Yes, that's correct.

VladimirShitov commented 3 days ago

Interesting, thanks! Could you also add support for missing values? I sometimes have them in my data when working with patient-level cell-type pseudobulks instead of single-cell data. If a sample doesn't have a particular cell type, it results in missing values. scipy allows to omit them and calculate the correlation, but here, I didn't find such an option.