singmann / afex

Analysis of Factorial EXperiments (R package)
119 stars 32 forks source link

assumptions (+ vignette), predict.afex_aov (take 2) #97

Closed mattansb closed 3 years ago

mattansb commented 3 years ago

(Had to close the previous PR, as I was unable to solve some conflicts there...)

This PR has the following additions:

Also has following changes:

singmann commented 3 years ago

Hey Mattan, I apologise for the extremely long delay. I plan to release version 1 of afex in July, ideally with this addition. You can have a look at what I am working on in the new branch.

Is your pull request still up to date? Should I check and integrate it as it is?

Thanks, Henrik

mattansb commented 3 years ago

Hey Henrik,

I've actually implemented all of these checks (Levene's test, spheracity, normality) in the performance package:

library(performance)
library(afex)

data(obk.long, package = "afex")

a <- aov_ez("id", "value", obk.long, 
            between = c("treatment"), 
            within = c("phase", "hour"))
#> Contrasts set to contr.sum for the following variables: treatment

check_sphericity(a)
#> Warning: Sphericity violated for: 
#>  - hour (p = 0.002)
#>  - treatment:hour (p = 0.002).

check_homogeneity(a)
#> OK: Variances in each of the groups are the same (Levene's Test, p = 0.114).

pn <- check_normality(a)
#> OK: residuals appear as normally distributed (p = 0.088).

library(patchwork)
plot(pn) / (plot(pn, type  = "qq") + plot(pn, type  = "pp"))
#> Loading required namespace: qqplotr

Created on 2021-05-12 by the reprex package (v1.0.0)

So as far as I'm concerned, qqnorm.aov_afex is no longer needed (and really neither are afex::test_levene() and afex::test_sphericity(), or the minor fixed to them in this PR).

I don't know how you'd like to proceed with this? Perhaps the vignette can point towards performance? Let me know what your thoughts are on this...


The predict.aov_afex() should be good to go.

singmann commented 3 years ago

Sounds all very good to me. Having all the tests and the plots in the performance package (with the vignette linking to them) sounds great (as you might remember, I was a bit hesitant about the qqnorm function anyway). I would then deprecate the tests already in afex and also point to your package. Also, still adding predict.afex_aov sounds also great. Are you willing to change the PR like that?

mattansb commented 3 years ago

Yes, I'll make the changes later today / tomorrow.

Thanks Henrik!

singmann commented 3 years ago

Feel free to move the existing test_ functions you wrote to deprecated.R and call them with message (or so) from the performance package. I am happy to add this package to Suggests in the DESCRIPTION for that.

mattansb commented 3 years ago

Okay, done.

Note that:

  1. performance's support for afex is not yet on CRAN (that's why vignette building is failing) but will be available before July.
  2. The name of the function performance::check_sphericity() might change before then. Will keep you posted.
singmann commented 3 years ago

Looks great. But if you think there might still be changes in the performance interface, it might be easier to wait for another few weeks until stuff has settled in performance. I am under no time pressure now. I just want to ensure it is ready in July.

What I want to say is that adding something now and then changing it again before July seems not optimal to me. Please let me know what you think is the best way forward.

mattansb commented 3 years ago

Yes, this should def not be merged before we've finalized on the function name (: No worries, I'll let you know when it's safe to merge 😇

singmann commented 3 years ago

Great. Looking forward to it.

mattansb commented 3 years ago

@singmann performance is on CRAN with the new function - so as far as I can tell, I'm done here (:

mattansb commented 3 years ago

@singmann Pinging you just incase you missed it 👆🏻

singmann commented 3 years ago

yes, thanks for the ping. Sadly, the new vignette fails for me at line 102:

> plot(is_norm, type = "qq")
Error in UseMethod("rstudent") : 
  no applicable method for 'rstudent' applied to an object of class "afex_aov"

With:

> session_info()
- Session info ---------------------------------------------------------------------
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United Kingdom.1252 
 ctype    English_United Kingdom.1252 
 tz       Europe/Berlin               
 date     2021-05-26                  

- Packages -------------------------------------------------------------------------
 !  package      * version    date       lib source        
    abind          1.4-5      2016-07-21 [1] CRAN (R 4.1.0)
 VP afex         * 0.28.0     2021-01-12 [?] CRAN (R 4.1.0)
    assertthat     0.2.1      2019-03-21 [1] CRAN (R 4.1.0)
    backports      1.2.1      2020-12-09 [1] CRAN (R 4.1.0)
    bayestestR     0.9.0      2021-04-08 [1] CRAN (R 4.1.0)
    boot           1.3-28     2021-05-03 [2] CRAN (R 4.1.0)
    broom          0.7.6      2021-04-05 [1] CRAN (R 4.1.0)
    cachem         1.0.5      2021-05-15 [1] CRAN (R 4.1.0)
    callr          3.7.0      2021-04-20 [1] CRAN (R 4.1.0)
    car            3.0-10     2020-09-29 [1] CRAN (R 4.1.0)
    carData        3.0-4      2020-05-22 [1] CRAN (R 4.1.0)
    cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.1.0)
    cli            2.5.0      2021-04-26 [1] CRAN (R 4.1.0)
    coda           0.19-4     2020-09-30 [1] CRAN (R 4.1.0)
    codetools      0.2-18     2020-11-04 [2] CRAN (R 4.1.0)
    colorspace     2.0-1      2021-05-04 [1] CRAN (R 4.1.0)
    crayon         1.4.1      2021-02-08 [1] CRAN (R 4.1.0)
    curl           4.3.1      2021-04-30 [1] CRAN (R 4.1.0)
    data.table     1.14.0     2021-02-21 [1] CRAN (R 4.1.0)
    DBI            1.1.1      2021-01-15 [1] CRAN (R 4.1.0)
    desc           1.3.0      2021-03-05 [1] CRAN (R 4.1.0)
    devtools     * 2.4.1      2021-05-05 [1] CRAN (R 4.1.0)
    digest         0.6.27     2020-10-24 [1] CRAN (R 4.1.0)
    dplyr          1.0.6      2021-05-05 [1] CRAN (R 4.1.0)
    effectsize     0.4.5      2021-05-25 [1] CRAN (R 4.1.0)
    ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
    emmeans        1.6.0      2021-04-24 [1] CRAN (R 4.1.0)
    estimability   1.3        2018-02-11 [1] CRAN (R 4.1.0)
    evaluate       0.14       2019-05-28 [1] CRAN (R 4.1.0)
    fansi          0.4.2      2021-01-15 [1] CRAN (R 4.1.0)
    farver         2.1.0      2021-02-28 [1] CRAN (R 4.1.0)
    fastmap        1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
    forcats        0.5.1      2021-01-27 [1] CRAN (R 4.1.0)
    foreign        0.8-81     2020-12-22 [2] CRAN (R 4.1.0)
    fs             1.5.0      2020-07-31 [1] CRAN (R 4.1.0)
    generics       0.1.0      2020-10-31 [1] CRAN (R 4.1.0)
    ggplot2        3.3.3      2020-12-30 [1] CRAN (R 4.1.0)
    ggridges       0.5.3      2021-01-08 [1] CRAN (R 4.1.0)
    glue           1.4.2      2020-08-27 [1] CRAN (R 4.1.0)
    gtable         0.3.0      2019-03-25 [1] CRAN (R 4.1.0)
    haven          2.4.1      2021-04-23 [1] CRAN (R 4.1.0)
    hms            1.1.0      2021-05-17 [1] CRAN (R 4.1.0)
    htmltools      0.5.1.1    2021-01-22 [1] CRAN (R 4.1.0)
    insight        0.14.0     2021-05-07 [1] CRAN (R 4.1.0)
    knitr          1.33       2021-04-24 [1] CRAN (R 4.1.0)
    labeling       0.4.2      2020-10-20 [1] CRAN (R 4.1.0)
    lattice        0.20-44    2021-05-02 [2] CRAN (R 4.1.0)
    lifecycle      1.0.0      2021-02-15 [1] CRAN (R 4.1.0)
    lme4         * 1.1-27     2021-05-15 [1] CRAN (R 4.1.0)
    lmerTest       3.1-3      2020-10-23 [1] CRAN (R 4.1.0)
    magrittr       2.0.1      2020-11-17 [1] CRAN (R 4.1.0)
    MASS           7.3-54     2021-05-03 [2] CRAN (R 4.1.0)
    Matrix       * 1.3-3      2021-05-04 [2] CRAN (R 4.1.0)
    memoise        2.0.0      2021-01-26 [1] CRAN (R 4.1.0)
    minqa          1.2.4      2014-10-09 [1] CRAN (R 4.1.0)
    multcomp       1.4-17     2021-04-29 [1] CRAN (R 4.1.0)
    munsell        0.5.0      2018-06-12 [1] CRAN (R 4.1.0)
    mvtnorm        1.1-1      2020-06-09 [1] CRAN (R 4.1.0)
    nlme           3.1-152    2021-02-04 [2] CRAN (R 4.1.0)
    nloptr         1.2.2.2    2020-07-02 [1] CRAN (R 4.1.0)
    numDeriv       2016.8-1.1 2019-06-06 [1] CRAN (R 4.1.0)
    openxlsx       4.2.3      2020-10-27 [1] CRAN (R 4.1.0)
    parameters     0.13.0     2021-04-08 [1] CRAN (R 4.1.0)
    pbkrtest       0.5.1      2021-03-09 [1] CRAN (R 4.1.0)
    performance  * 0.7.2      2021-05-17 [1] CRAN (R 4.1.0)
    pillar         1.6.1      2021-05-16 [1] CRAN (R 4.1.0)
    pkgbuild       1.2.0      2020-12-15 [1] CRAN (R 4.1.0)
    pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
    pkgload        1.2.1      2021-04-06 [1] CRAN (R 4.1.0)
    plyr           1.8.6      2020-03-03 [1] CRAN (R 4.1.0)
    prettyunits    1.1.1      2020-01-24 [1] CRAN (R 4.1.0)
    processx       3.5.2      2021-04-30 [1] CRAN (R 4.1.0)
    ps             1.6.0      2021-02-28 [1] CRAN (R 4.1.0)
    purrr          0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
    R6             2.5.0      2020-10-28 [1] CRAN (R 4.1.0)
    Rcpp           1.0.6      2021-01-15 [1] CRAN (R 4.1.0)
    readxl         1.3.1      2019-03-13 [1] CRAN (R 4.1.0)
    remotes        2.3.0      2021-04-01 [1] CRAN (R 4.1.0)
    reshape2       1.4.4      2020-04-09 [1] CRAN (R 4.1.0)
    rio            0.5.26     2021-03-01 [1] CRAN (R 4.1.0)
    rlang          0.4.11     2021-04-30 [1] CRAN (R 4.1.0)
    rmarkdown      2.8        2021-05-07 [1] CRAN (R 4.1.0)
    rprojroot      2.0.2      2020-11-15 [1] CRAN (R 4.1.0)
    rstudioapi     0.13       2020-11-12 [1] CRAN (R 4.1.0)
    sandwich       3.0-1      2021-05-18 [1] CRAN (R 4.1.0)
    scales         1.1.1      2020-05-11 [1] CRAN (R 4.1.0)
    see            0.6.3      2021-04-09 [1] CRAN (R 4.1.0)
    sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 4.1.0)
    stringi        1.6.1      2021-05-10 [1] CRAN (R 4.1.0)
    stringr        1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
    survival       3.2-11     2021-04-26 [2] CRAN (R 4.1.0)
    testthat     * 3.0.2      2021-02-14 [1] CRAN (R 4.1.0)
    TH.data        1.0-10     2019-01-21 [1] CRAN (R 4.1.0)
    tibble         3.1.2      2021-05-16 [1] CRAN (R 4.1.0)
    tidyr          1.1.3      2021-03-03 [1] CRAN (R 4.1.0)
    tidyselect     1.1.1      2021-04-30 [1] CRAN (R 4.1.0)
    usethis      * 2.0.1      2021-02-10 [1] CRAN (R 4.1.0)
    utf8           1.2.1      2021-03-12 [1] CRAN (R 4.1.0)
    vctrs          0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
    withr          2.4.2      2021-04-18 [1] CRAN (R 4.1.0)
    xfun           0.23       2021-05-15 [1] CRAN (R 4.1.0)
    xtable         1.8-4      2019-04-21 [1] CRAN (R 4.1.0)
    yaml           2.2.1      2020-02-01 [1] CRAN (R 4.1.0)
    zip            2.1.1      2020-08-27 [1] CRAN (R 4.1.0)
    zoo            1.8-9      2021-03-09 [1] CRAN (R 4.1.0)

[1] C:/Users/singm/Documents/R/win-library/4.1
[2] C:/Program Files/R/R-4.1.0/library

 V -- Loaded and on-disk version mismatch.
 P -- Loaded and on-disk path mismatch.
mattansb commented 3 years ago

Oops - I thought the new version of {see} is on CRAN. Should be in a few days.

Will let you know - sorry for the confusion!

singmann commented 3 years ago

no problem

mattansb commented 3 years ago

Alright - we should be good to go! Bot performance and see are on CRAN!

singmann commented 3 years ago

Hey Mattan.

Sorry for my as usual long time to react, but I now found finally the time to look at this. This overall looks great, but I was a bit uneasy by just having a vignette introducing the assumption tests without a proper framing. Whereas I know that assumption tests are important, I do not like when they are applied blindly. Thus, I added a somewhat lengthy foreword providing a bit of context to the role of assumption tests. Please let me know if you can live with it or have some comments. Happy to also have a video chat as this might be easier if you want to discuss specific formulations than doing the github back and forth. If we can find a compromise on the vignette, I will merge this.

Cheers, Henrik

singmann commented 3 years ago

Oh, I believe I also fixed an error in your name in the header of the vignette. Please can you check to make sure I did not add another one?

mattansb commented 3 years ago

Henrik,

This is a really really great forward (very similar to what I teach! Yay!). The only point I have is about this line:

[...] the assumption of normality of the residuals only is requires for small samples, thanks to the central limit theorem.

It is my understanding that:

  1. M/SE ~ t(df) only when the residuals ~N; even though the CLT is relevant for the M part, the SE2 still needs to be ~Chi2(df) (which is conditional on the normality of residuals, and CLT is significantly less effective here) for their ratio to ~t(df).

  2. What is true as that for NHST, the Pr(p < alpha) is robust (so the p-values are not exact, but the decision itself is robust).

That's at least what I've been taught...

We can zoom to discuss this further, but this is really just a minor point IMO - the main point about robustness to non-norm is true either way.

mattansb commented 3 years ago

Oh, I believe I also fixed an error in your name in the header of the vignette.

How embarrassing - I can't even spell my own name?? Thanks!

singmann commented 3 years ago

Okay thanks. I have rephrased the normality part as: "If the main goal of an ANOVA is to see whether or not certain effects are significant, then the assumption of normality of the residuals is only required for small samples, thanks to the central limit theorem." I think this captures your point 2 sufficiently. I will merge this now.

mattansb commented 3 years ago

That's great - thanks again!