Closed amjass12 closed 3 years ago
This is because scipy is not computing distance correlation, but transforming the usual (Pearson) correlation R into a (semi)metric, as 1 - R, so that highly correlated variables (correlation near 1) are close using this metric (distance near 0). The naming of that functionality is unfortunate, and I am afraid that it has confused some people before (see https://stackoverflow.com/questions/35988933/scipy-distance-correlation-is-higher-than-1 and https://stackoverflow.com/questions/60392972/scipy-distance-correlation-scale, for example).
Thank you @vnmabus for clarifying this makes perfect sense!
so just to clarify, dcor is the right package to calculate the distance correlation that is able to find pairwise comparisons that can find both linear and non-linear correlations as per the definition of the distance correlation. (sorry, just want to be absolutely sure I am using the intended analyses!)
thanks again
Yes, this package can find nonlinear correlations, as it implements Székely's distance correlation (https://en.wikipedia.org/wiki/Distance_correlation).
perfect, thank you for confirming and thanks for your time.
Hi!
I have started using dcor as as I need to find pairwise correlations between two variables/vectors for every pairwise comparison in a dataframe. I am using the distance correlation as i need to find correlations not just for linear pairwise correlations but also non-linear correlations.
Having read the documentation, I know this is the correct implementation for this purpose, however, as I understand it, Scipy also provides a distance correlation function. I am getting different results when using both dcor and scipy and was wondering if you could explain why? I am unsure if Scipy is actually using the same distance correlation, or if their implementation contains something obvious I have missed which leads to the different results:
There is a large discrepancy here and would appreciate clarification!
thank you!