juliasilge / widyr

Widen, process, and re-tidy a dataset
http://juliasilge.github.io/widyr/
Other
327 stars 29 forks source link

Getting incorrect totals with pairwise_count() when diag = TRUE #34

Open dsobolew opened 3 years ago

dsobolew commented 3 years ago

When setting diag = TRUE I am seeing inflated counts among diagonal pairs.

library(tidyverse)

test_tb <- tibble(
  group = c("a","b","c","d","e","f","g","h","i"),
  score1 = c(1,2,0,4,5,2,7,0,2),
  score2 = c(2,1,0,4,5,2,7,0,3)
  ) 

test_tb

test_tb %>%
  pivot_longer(cols = starts_with("score")) %>%
  widyr::pairwise_count(value, group, diag = TRUE, sort = TRUE, upper = FALSE)

Results:

> test_tb
# A tibble: 9 x 3
  group score1 score2
  <chr>  <dbl>  <dbl>
1 a          1      2
2 b          2      1
3 c          0      0
4 d          4      4
5 e          5      5
6 f          2      2
7 g          7      7
8 h          0      0
9 i          2      3
> 
> test_tb %>%
+   pivot_longer(cols = starts_with("score")) %>%
+   widyr::pairwise_count(value, group, diag = TRUE, sort = TRUE, upper = FALSE)
# A tibble: 9 x 3
  item1 item2     n
  <dbl> <dbl> <dbl>
1     2     2     4
2     1     1     2
3     1     2     2
4     0     0     2
5     4     4     1
6     5     5     1
7     7     7     1
8     2     3     1
9     3     3     1

The (2,2), (1,1) pairs have inflated numbers. It appears every pair containing a 2 is counted and attributed to (2,2).