strengejacke / sjstats

Effect size measures and significance tests
https://strengejacke.github.io/sjstats
189 stars 21 forks source link

`sjstats::omega_sq()` confidence intervals contain 0 for p < 0.05 terms #51

Closed IndrajeetPatil closed 4 years ago

IndrajeetPatil commented 6 years ago

There is a bug in sjstats::omega_sq(). Even though p-values for all terms in the following anova object are less than 0.05, the confidence intervals for partial omega-squared include 0.

# for reprducibility
set.seed(123)
library(ggstatsplot)
library(sjstats)

# to speed up the calculation, let's use only 10% of the data
movies_10 <-
  dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)

# `aov` object
stats.object <- stats::aov(formula = rating ~ mpaa * genre,
                           data = movies_10)

# checking tidy output
broom::tidy(stats.object)

#> # A tibble: 4 x 6
#>   term          df sumsq meansq statistic   p.value
#>   <chr>      <dbl> <dbl>  <dbl>     <dbl>     <dbl>
#> 1 mpaa           2  19.3   9.64      7.16  0.000963
#> 2 genre          4  17.8   4.44      3.30  0.0119  
#> 3 mpaa:genre     7  21.4   3.05      2.27  0.0301  
#> 4 Residuals    229 308.    1.35     NA    NA

# getting confidence intervals
sjstats::omega_sq(model = stats.object,
                  partial = TRUE,
                  ci.lvl = 0.95)
#>         term partial.omegasq conf.low conf.high
#> 1       mpaa           0.048   -0.022     0.144
#> 2      genre           0.036   -0.005     0.133
#> 3 mpaa:genre           0.035   -0.013     0.158
strengejacke commented 6 years ago

I'm not sure, but I would say the p value refers to a different effect size.

strengejacke commented 6 years ago

Or better: different test statistic. Since some CIs are bootstrapped, the test statistic is no longer suitable

strengejacke commented 6 years ago

Taking your summary, the values for partial Omega squared and the p-values seem to be ok:

(2 * (9.637 - 1.346)) / (2 * 9.637 + 240 * 1.346)
pf(7.160, df1 = 2, df2 = 229, ncp = F, lower.tail = F)

It's just that the 95% quantiles of the bootstrapped values for omega squared indeed include the zero. I'm not sure how you would "match" these discrepancies? The bootstrapped CI are not equal tailed around the estimate - maybe it's better to get the bootstrapped SE and then calculate CI's based on normal distribution...

IndrajeetPatil commented 6 years ago

Yeah, I think it will be better to calculate CI based on a normal distribution, or at least have an option to choose the sampling distribution. All effect size confidence intervals in my package default to using normal distribution (https://github.com/IndrajeetPatil/ggstatsplot/blob/master/R/helpers_effsize_ci.R), which means there is usually a good correspondence between traditional p-values and bootstrapped CIs. Of course, users still have the option to choose what kind of CIs they want usign conf.type argument.

Therefore, it's super-surprising to me that, for example, the term mpaa has a p-value of 0.0009 and still includes 0 in its confidence intervals!

strengejacke commented 6 years ago

Taking the bootstrapped SE or computing SE based on CI makes it even worse - the lower bound of the normally distributed CI's is much lower than the bootstrapped CI (because CI's are not equal tailed).

I'm not sure if it would make sense to re-calculate the p-value for omega-squared, because then you have a significant term for your "normal" model and a non-significant term when you compute omega-squared.

Maybe you have an idea? As I typically don't use Anova, I'm not sure how to proceed here, except for mentioning in the docs that CI's based on bootstrapping may indicate in "non-significance", while the model does not.

IndrajeetPatil commented 6 years ago

Sorry, I haven't been much of help; I am out of my statistical depth here. I will try to do some digging on when and why there are discrepancies between bootstrapped CIs and traditional p-values.

IndrajeetPatil commented 4 years ago

For a comparison, output from effectsize:

# for reprducibility
set.seed(123)
library(ggstatsplot)
library(effectsize)

# to speed up the calculation, let's use only 10% of the data
movies_10 <-
  dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)

# `aov` object
stats.object <- stats::aov(formula = rating ~ mpaa * genre,
                           data = movies_10)

effectsize::omega_squared(model = stats.object,
                  partial = TRUE)
#> Parameter  | Omega_Sq_partial |         90% CI
#> ----------------------------------------------
#> mpaa       |            -0.01 | [-0.01,  0.01]
#> genre      |             0.10 | [-0.02,  0.17]
#> mpaa:genre |             0.00 | [-0.10,  0.00]
mattansb commented 4 years ago

Several notes:

  1. When an F test has a p < 0.05 (one tailed), that mean that the parametric (non-central parameter) 90% (two tailed) CI for partial eta squared will not contain 0.
  2. No guarantee on partial omega.
  3. No guarantee on bootstrapped CIs.