Open MislavSag opened 5 years ago
You can't use pairwise_cor
for this, but I don't think this actually is a situation where you need widyr, since you don't need to cast it into a wide matrix to compute the correlation between two columns. Unless I'm mistaken (I couldn't be sure without a desired input and desired output), your data is already in the appropriate form for using dplyr to find correlations.
Would this do what you wanted?
sample_of_firms %>%
group_by(id) %>%
summarize(correlation = cor(sales, costs))
Sorry for late answer.
With your proposal I don't get correlation matrix. That is I don't get correlation between sales and costs between all firms in the sample.
Hi: could you explain what you mean by "correlation between sales and costs between firms"? For example, what would be in the result for
item1 item2 score
FirmA FirmB ???
Would it be the correlation between the sales at firm 1 and the costs at firmA, or the cost at firm1 and the sales at firmB, or what?
if I have 3 firms, it would be: item1 item2 score desc FirmA FirmB 0.54 correlation between sales of firm A and costs of firm B FirmA FirmC 0.45 correlation between sales of firm A and costs of firm C FirmA FirmA 0.45 correlation between sales of firm A and costs of firm A (these is not necessary) FirmB FirmA 0.54 correlation between sales of firm B and costs of firm A FirmB FirmC 0.45 correlation between sales of firm B and costs of firm C FirmB FirmB 0.45 correlation between sales of firm B and costs of firm b (these is not necessary)
and same for C.
Is it possible to do accomplish above in widyr package?
So "only" defference is in valuce column. I don't need correlation between var1 of id1 and var1 all other ids, but correlation between var1 of id1 and var2 of all other id's.
All examples I saw include only correlations between same variables. For example, correlations of sales between firms:
correlations <- widyr::pairwise_cor(sample_of_firms, id, year, sales)
What if I want to calculate correlations between different variables, for example, sales and costs columns:
correlations <- widyr::pairwise_cor(sample_of_firms, id, year, c(sales, costs))
If I understand it right this is not possible since argument value includes only one column?