juliasilge / widyr

Widen, process, and re-tidy a dataset
http://juliasilge.github.io/widyr/
Other
327 stars 29 forks source link

Correlation between 2 different variables #21

Open MislavSag opened 5 years ago

MislavSag commented 5 years ago

All examples I saw include only correlations between same variables. For example, correlations of sales between firms: correlations <- widyr::pairwise_cor(sample_of_firms, id, year, sales)

What if I want to calculate correlations between different variables, for example, sales and costs columns: correlations <- widyr::pairwise_cor(sample_of_firms, id, year, c(sales, costs))

If I understand it right this is not possible since argument value includes only one column?

dgrtwo commented 5 years ago

You can't use pairwise_cor for this, but I don't think this actually is a situation where you need widyr, since you don't need to cast it into a wide matrix to compute the correlation between two columns. Unless I'm mistaken (I couldn't be sure without a desired input and desired output), your data is already in the appropriate form for using dplyr to find correlations.

Would this do what you wanted?

sample_of_firms %>%
  group_by(id) %>%
  summarize(correlation = cor(sales, costs))
MislavSag commented 5 years ago

Sorry for late answer.

With your proposal I don't get correlation matrix. That is I don't get correlation between sales and costs between all firms in the sample.

dgrtwo commented 5 years ago

Hi: could you explain what you mean by "correlation between sales and costs between firms"? For example, what would be in the result for

item1    item2      score
FirmA    FirmB      ???

Would it be the correlation between the sales at firm 1 and the costs at firmA, or the cost at firm1 and the sales at firmB, or what?

MislavSag commented 5 years ago

if I have 3 firms, it would be: item1 item2 score desc FirmA FirmB 0.54 correlation between sales of firm A and costs of firm B FirmA FirmC 0.45 correlation between sales of firm A and costs of firm C FirmA FirmA 0.45 correlation between sales of firm A and costs of firm A (these is not necessary) FirmB FirmA 0.54 correlation between sales of firm B and costs of firm A FirmB FirmC 0.45 correlation between sales of firm B and costs of firm C FirmB FirmB 0.45 correlation between sales of firm B and costs of firm b (these is not necessary)

and same for C.

MislavSag commented 5 years ago

Is it possible to do accomplish above in widyr package?

So "only" defference is in valuce column. I don't need correlation between var1 of id1 and var1 all other ids, but correlation between var1 of id1 and var2 of all other id's.