skinnider / dismay

Calculation of distance metrics for matrices
MIT License
25 stars 5 forks source link

which measure of proportionality is better? #2

Open massonix opened 4 years ago

massonix commented 4 years ago

Dear Michael,

I am conducting an analysis in which I aim to rank all known sources of variance (ie cell type, donor, technical artifacts) in my single-cell RNA-seq dataset. Among others, I am computing all pairwise cell-cell distances, getting a distance matrix as an output. Your article "Evaluating measures of association for single-cell transcriptomics" has been extremely useful in this regard. I also observe a greater signal-to-noise ratio and overall accuracy when using measures of proportionality (phi and rho) as compared to Pearson correlation (as you report in figure 4).

My question is: which measure of proportionality would you use? I like rho because its bounded between [-1,1]. However, I get a great deal of negative values (ie -0.1) which I find it hard to interpret. On the other hand, phi is always positive, but is unbounded.

Thanks a lot for your time and help, and for creating this awesome package.

Best,

Ramon

skinnider commented 4 years ago

Hey Ramon, sorry for the delay. Glad to hear our paper was useful to you, and that you are seeing similar results. I tend to use rho because, as you say, it’s often useful to have a measure bounded by [-1, 1]. In practice, depending on the application I’m not sure the choice is that significant; the two are related by a monotonic function and correspondingly, the differences we saw between them in our paper were quite minor. You might want to take a look at the propr paper (https://www.nature.com/articles/s41598-017-16520-0), which is the implementation that dismay is providing a fairly shallow wrapper around, for more details - the SI appendix of this paper might be particularly useful. Hope this helps. Mike

massonix commented 4 years ago

Thanks a lot Mike, this is very useful!

Ramon