Closed tavareshugo closed 2 years ago
Thanks for the issue!
Duplicate of #257, #273, and friends. See the discussion there for thoughts from different folks on the infer team—our current approach of warning + documenting here isn't a silver bullet, but feels good to us. :)
library(infer)
df <- data.frame(
y = c(rnorm(100, mean = 0), rnorm(100, mean = 10)),
x = rep(c("a", "b"), each = 100)
)
dist <-
df %>%
specify(y ~ x) %>%
hypothesize(null = "independence") %>%
generate(reps = 200, type = "permute") %>%
calculate("diff in means", order = c("a", "b"))
obs <-
df %>%
specify(y ~ x) %>%
calculate("diff in means", order = c("a", "b"))
get_p_value(dist, obs, "left")
#> Warning: Please be cautious in reporting a p-value of 0. This result is an
#> approximation based on the number of `reps` chosen in the `generate()` step. See
#> `?get_p_value()` for more information.
#> # A tibble: 1 × 1
#> p_value
#> <dbl>
#> 1 0
Created on 2022-09-08 by the reprex package (v2.0.1)
Oh, sorry I'd missed those closed issues. Thanks for considering it anyway!
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
I would like to suggest an update to the
get_p_value()
function so that a p-value of zero is never returned. As indicated in Phipson & Smyth, Permutation P-values Should Never Be Zero, the permutation-based exact p-value (i.e. without replacement) is:$p_u = \frac{b + 1}{m + 1}$
For methods with replacement (i.e. the bootstrap) the calculation is different, and the function from their package
statmod::permp()
can be used instead.I'm not sure it would be easy to implement this, without an additional argument to the function
get_p_value()
to specify which type of sampling was used ("bootstrap"
or"permutation"
). But I think the formula above would already be better than the current implementation, even for the bootstrap method (at least for large sample sizes). This could be done by changing the individual*_p_value()
functions: