joon-e / tidycomm

tidycomm: Data Modification and Analysis for Communication Research
https://joon-e.github.io/tidycomm/
GNU General Public License v3.0
15 stars 5 forks source link

correlate_partial() #27

Closed LKobilke closed 1 year ago

LKobilke commented 1 year ago

Implements the correlate_partial function based on ppcor::pcor.test.

Accepts exactly three numeric variables and computes pairwise partial correlations (pearson, kendall, spearman) for all combinations, i.e. pairs, of specified variables while controlling for the third variable. Stops and provides a case-specific warning if too little or too many variables or non-numeric variables are provided. If input data is grouped, groups are dropped and a warning is issued.

Combinations of pairs are calculated using combinat::permn.

Returns a tibble.

Currently missing is the additional model as an attribute. This is added by @joon-e, see #24 .

MarHai commented 1 year ago

Merge should aim at devel, not master.

MarHai commented 1 year ago

Code looks good and validates well.

However, I was expecting an extension to the correlate function with a parameter partial = TRUE. Why did you change this plan?

LKobilke commented 1 year ago

Code looks good and validates well.

However, I was expecting an extension to the correlate function with a parameter partial = TRUE. Why did you change this plan?

Good point. I've just pushed a new commit that implements a change to the correlate() function, i.e., adds the parameter partial = TRUE, and makes correlate_partial() an internal function. I was initially hesitant to do this since correlate() will grab all numeric variables if none are supplied. My concern was that this could lead to lots of information overload when reporting all possible x,y,z combinations. To avoid information overload correlate(partial = FALSE) will now work with two, three, or more variables, but correlate(partial = TRUE) will only accept three variables and throw a warning otherwise. This difference in behavior may seem inconsistent and I felt that a clear distinction between bivariate correlation and partial correlation might help to reduce this inconsistency.

Despite my initial concerns, I see the advantage to expand on the correlate() function. I'm open to keeping this version.