juliasilge / widyr

Widen, process, and re-tidy a dataset
http://juliasilge.github.io/widyr/
Other
327 stars 29 forks source link

use argument for pairwise_cor_ #1

Open yosuke-yasuda opened 8 years ago

yosuke-yasuda commented 8 years ago

Hi, thanks for making a nice library :)

I wanted to use use="pairwise.complete.obs" for the argument.

I tried changing the code like this (I changed the if statement)

pairwise_cor_ <- function(tbl, group, item, value,
                      method = c("pearson", "kendall", "spearman"),
                      use = "everything",
                      ...) {
  method <- match.arg(method)

  f <- if (method == "pearson" && use == "everything") {
    cor_sparse
  } else {
    function(x){stats::cor(x, method = method, use = use)}
  }
  cor_func <- squarely_(f, group, item, value,
                        sparse = (method == "pearson" && use == "everything"), ...)

  tbl %>%
    ungroup() %>%
    cor_func() %>%
    rename(correlation = value)
}

But this test case which I made doesn't pass

test_that("pairwise_cor should work with Pearson and pairwise.complete.obs", {
  # create NA pattern column and 0 pattern column
  data <- cbind(d, data.frame(
    na_col=c(1, 2, 3, 6, NA, 4, 7, 9, 8),
    zero_col=c(1, 2, 3, 6, 0, 4, 7, 9, 8)))
  ret <- data %>%
    pairwise_cor(row, col, na_col, use="pairwise.complete.obs", diag=TRUE)
  check_na <- matrix(data$na_col, nrow=3) %>%
    cor(method="pearson", use="pairwise.complete.obs") %>%
    reshape2::melt()
  check_zero <- matrix(data$zero_col, nrow=3) %>%
    cor(method="pearson", use="pairwise.complete.obs") %>%
    reshape2::melt()
  expect_true(all(check_na$value==ret$correlation))
  expect_true(any(check_zero$value!=ret$correlation))
})

all(check_zero$value!=ret$correlation) becomes TRUE in this case.

I think this is because of fill=0 argument of acast in widely_

I want to know if you will support this or not. And if you will, I wanna know how you will do it.

I think the value it should be filled is decided by functions (if it's cor, it will be NA and if it's distance, it will be 0). So fill argument should be in the argument of widely_, in my opinion.

And acast also have fun.aggregate argument but it's not indicated in widely_. I also think it should be decided flexibly.

If you let me know your thought, I can try to implement it. Feel free to ask me if you want it.