laresbernardo / lares

Analytics & Machine Learning R Sidekick
https://laresbernardo.github.io/lares/
233 stars 49 forks source link

Correlations expressed as percentages #31

Closed teecrow closed 3 years ago

teecrow commented 3 years ago

Hi there, very nice package here - as a social scientist I have found the correlation functions particularly useful (and beautiful) for exploring correlations.

I might suggest a small tweak -- in the corr_cross function and a few others, correlations are expressed as percentages, but this is really never done in the sciences that heavily use correlations. Perhaps there could be an argument added to these functions which would keep it in the original r metric (-1 to 1, or abs(corr)?

Expressing it as a percentage can lead to confusion for a couple main reasons:

  1. A correlation coefficient ranges from -1 to 1 -- and a percentage can not be negative. (Of course this only applies to the functions that do not take the absolute value of the correlation.)
  2. Expressing it as a percentage may lead people to confuse it with the coefficient of determination, or R^2, which is sometimes expressed (and thought of) as a percentage, because it is the percentage of the variance in the outcome variable explained by the predictor(s).
  3. Most scientists examining correlations will want them in the -1 to 1 metric (or the absolute value of that).
  4. I noticed all this upon finding a bug when examining some data (see below): There appears to be a bug when setting the argument type = 2 in the corr_cross function: for correlations ranging from .51 to .59, the x-axis is mislabelled. Because I don't desire percentages anyway, this is easily sidestepped by deleting all arguments to the final call to scale_y_continuous within the function. Or even better, setting the arguments to limits = c(min(ret$corr), max(ret$corr)) to scale the axis nicely to the data.

image

And with the arguments removed in scale_y_continuous: image

In the next ~3-6months, once I learn to use git, I am happy to make a pull request and add this on behalf of anyone else who may have a similar suggestion!

Hopefully this is helpful, and it's really minor -- thanks again for the useful package!

laresbernardo commented 3 years ago

Hi @teecrow Thanks for your detailed feedback! Glad you are using the library for your analyses :) I do agree with you that the "correct" way to show correlations is NOT with percentages, but they do help with visualizations (less zeroes at least). I will consider your feedback and make some changes to enable both POVs or do something about it. Regarding the scale_y_continuous usage, I agree it's not formatting as it should for small numbers (that's a ggplot2 thing I can surely fix for ur plots). I would appreciate if you could share a reproducible example so I can replicate your plots with fixed/changed results.

laresbernardo commented 3 years ago

Check out new plots format here: https://laresbernardo.github.io/lares/reference/corr_cross.html

teecrow commented 3 years ago

Check out new plots format here: https://laresbernardo.github.io/lares/reference/corr_cross.html

Just tested with the new version, and it looks great. No issues at all: Easier to interpret, and the problem with the small numbers is gone! Thanks again.