jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

Option to only compute and display relevant post-hocs... #637

Closed PsyTechMMU closed 4 months ago

PsyTechMMU commented 4 years ago
* Enhancement: Option for post-hocs to only be calculated and displayed for automatically or user-defined comparisons * Purpose: Computing all pairwise comparisons is often unnecessary, more complicated than required, and outputs would be affected by some corrections e.g., Bonferroni. * Use-case: **Is your feature request related to a problem? Please describe.** Currently, when outputting the post-hocs for e.g., a two-way interaction is chosen, all permutations of pairwise comparisons are computed and displayed when realistically only a subset of these are required. For instance, a 2 x (2) ANOVA with a significant interaction does not typically need to compare Group1 Condition1 with Group2 Condition2... **Describe the solution you'd like** Option in post-hocs to output only specific (user-defined) comparisons or for only the comparisons that differ by just one factor to be computed/displayed (the automatic option).
PsyTechMMU commented 4 years ago

I'm quite surprised this isn't a bigger or at least more widely considered issue...

Currently, for example, using the Holm correction will adjust based on all possible pairwise comparisons, but in many factorial designs, not all pairs are relevant (or even legitimate) comparisons, resulting in e.g., (2)x3 mixed ANOVA: 15 post-hocs tests computed with p-values corrected based on there being 15 tests when only 9 tests were wanted/needed (the others contrast on more than one level).

Conducting separate t-tests (or equivalent) is the workaround, but then corrections need to be applied manually as well.

JohnnyDoorn commented 4 years ago

Hi @PsyTechMMU,

The idea of post hoc tests is to investigate, after collecting the data, where group differences lie, for instance when following up on a significant main or interaction effect. Post hoc tests are not used for analyzing specific predictions made by theory - they are not "planned" in that sense, which is why post hoc analysis looks at all the factor level comparisons. In case you want to investigate a specific planned comparison because it postulated by theory, or planned beforehand, contrast analysis is typically used. This analysis seems to perfectly correspond to your use-case, where you only want to look at a handful of comparisons, out of the many possible comparisons. In this case, these comparisons are planned before collecting the data, so there is no need for a p-value correction for multiple comparisons. I recently wrote a blogpost about the similarities and differences between these analysis here: https://jasp-stats.org/2020/04/14/the-wonderful-world-of-marginal-means/

We also recently added the option to specify custom contrasts, so that you have complete control over which factor levels are compared to each other (the next release will also have this for interaction effects). Again, the difference here is that these contrasts are planned.

In sum, post hoc analysis inherently considers all possible comparisons and then corrects for looking at these comparisons. In case you would like to cherrypick certain comparisons because these are of particular interest, the contrast analysis is the way to go.

Kind regards, Johnny

PsyTechMMU commented 4 years ago

Thanks for the considered response, @JohnnyDoorn. Much appreciated. However, I think there's been too much emphasis on the literal/traditional sense of post-hoc as opposed to the options and functionality offered only in the ANOVA sub-module labelled as such.

Regarding planned/predicted vs unplanned/"post-hoc", this is an important distinction, as you say, but post-hoc taking literally all possible pairwise comparisons into account does not tally and is often not only impractical, but arguably invalid due to unnecessarily over-correcting and potentially increasing Type II error by including a significant number of irrelevant comparisons that have no place in the research design. For example, a (2)x(2)x3 design e.g., a language ERP study with Condition (Pseudohomophone, Pseudoword) x Hemisphere (Left, Right) x Group (English, Spanish, Chinese) only has 24 legitimate (meaningful/relevant) comparisons, yet all 66 possible comparisons are calculated with corrections based on this inflated number... Even typical teaching of conducting post-hocs and the ~dreaded~ classic Bonferroni correction will only use the n of conducted comparisons for dividing the alpha. (What is technically/textbook correct and done/acceptable in the real world is not the point though and it's mostly a workflow enhancement.)

I would, therefore, really recommend and immensely appreciate at least the option in the post-hocs area to select just the meaningful/relevant pairs/permutations that differ on only one factor e.g., English + Pseudohomophone + Left Hemisphere vs English + Pseudoword + Left Hemisphere (differs only on Condition) English + Pseudohomophone + Left Hemisphere vs English + Pseudohomophone + Right Hemisphere, (differs only on Hemisphere) and English + Pseudohomophone + Left Hemisphere vs Spanish + Pseudoword + Left Hemisphere (differs only on Group) are fine and meaningful, but English + Pseudohomophone + Left Hemisphere vs Spanish + Pseudoword + Left Hemisphere (differs on Condition and Group) is not, and English + Pseudohomophone + Left Hemisphere vs Spanish + Pseudoword + Right Hemisphere (differs on Condition, Hemisphere, and Group) is certainly not.

While I understand that comparing these pairs will be possible in the next update via interaction-level contrasts, it is simply not the same, not the expected results in some disciplines, and contrasts don't allow for multiple comparisons corrections...

And on that point... I think it's a bone of contention (and a large one) that planned comparisons do not require corrections for multiplicity. I'd much prefer this, but it's often not the consensus, which is instead that corrections are required whenever >1 pairwise comparisons are made.

JohnnyDoorn commented 4 years ago

Hi @PsyTechMMU,

If I understand your last sentences correctly, you are arguing in favor of applying p-values corrections in the planned contrast analysis. I think this might be the easiest to implement, as this analysis already has the functionality to specify custom comparisons and not include all permutations. I can look into adding a number of checkboxes for specific corrections in the contrast analysis.

For clarity, the post hoc and contrast analyses are very much the same, as both use estimated marginal means for each cell, which are then compared to each other to see if they exhibit a meaningful/significant difference. The two differences between the two analyses are the p-value correction and whether they look at all permutations, or a specific subset of comparisons (depending on the contrast type).

Does that sound OK to you?

Kind regards, Johnny

tomtomme commented 9 months ago

Would be great if this could be implemented for contrasts & more modules in general As @JohnnyDoorn wrote: The advantage compared to post hoc tests is in fact, that you do not overcorrect, when you do custom contrasts, where you would only correct for type I error cummulation for those few contrasts you chose to compute.

In a next step: Those multiple comparison corrections could be implemented for "all" modules with p-values. Or is there a special reason, why the corrections are in place only for some of the ANOVAs, GLM & Mixed Models?

At least for our core modules family wise type I error corrections should be implemented if you do multiple tests over the same set of data. The corrections are missing in:

I also thought about how to correct over all p-values of all analyses - if you consider your analysis completely exploratory. For this, we might want to implement a global option e.g. in "preferences => results => table options". This global correction could also work more granular, if the user would be able to mark every analysis as exploratory or confirmatory. And then only p-values of the analysis marked as exploratory would be corrected. This "marking" could be done with a checkbox or switch next to the 5 buttons each analysis panel has.

Mockup:

Screenshot_20231205_222520

What do you think? I guess @dustinfife would be a fan of this idea, correct? We might even want to hide p-values by default for analysis marked exploratory. I saw some mockups from dustin back in the day working in a similar direction, marking analyses EDA vs. CDA.

JohnnyDoorn commented 4 months ago

@PsyTechMMU I just tried some alternative layouts in the post hoc tests, where now you can choose to have the output grouped for levels of your factors, when you have a posthoc test for an interaction. Is this related to what you would like to see? So if I specify a 2x3 ANOVA and request post hocs for the effect size, I get the following two tables, instead of being bombarded with a huge table that contains all 15 combinations of the cells: Screenshot 2024-04-29 at 21 21 37