Choose standardizer for effect size in paired t-test

lindeloev commented 6 years ago

If you do a Paired samples t-test on pretest and posttest scores, you would use the pretest SD as standardizer, not the SD of the pairwise differences as is currently used. This is because the pretest SD represents the population variance that you want to "move" the subjects relative to.

So I suggest that you add an option to choose standardizer [var2-var1, var1, var2]. Both for frequentist t-test and the Bayesian one. The ES plays a central role in the Bayesian t-test since it is currently used for setting priors, etc. The ability to use the non-standardized effect size for priors and plots would also be wonderful, but that's another issue!

EJWagenmakers commented 6 years ago

Do you have a reference perhaps? If the test is on the score difference, I am not sure how we can define the test-relevant parameter as one that involves only the pretest... E.J.

lindeloev commented 6 years ago

Cumming (2013): https://link.springer.com/article/10.3758%2Fs13428-013-0392-4, section "choice of standardizer". Wolfgang Viechtbauer (metafor developer) has also raised it here and there, e.g.: https://stats.stackexchange.com/a/256205/17459.

The pretest would be just one of the variables, so if you could select one of [var2-var1, var1, var2], that could work. It would complicate the widget a bit, though.

EJWagenmakers commented 6 years ago

Thanks; I am afraid that, unless I am missing something, I disagree with Cumming here. We want to test the difference, that is, the treatment effect. Consequently, the effect size we want to learn about also concerns the difference. If every participant shows similar improvement, SDdiff is small and effect size is therefore large; this is how it should be. On the other hand, if you use SD1 then effect size for the treatment effect is determined by pretest homogeneity. But that says nothing about the effect of the treatment. E.J.

luketudge commented 6 years ago

I think Cumming's point was that using SDdiff as standardizer is indeed appropriate for inference about the difference (for which we want to know the sampling distribution of the difference rather than of the mean in any one group), but that when viewed purely descriptively it can be misleading. Is that right?

If the treatment effect is extremely consistent across all participants but tiny relative to the pre-treatment variance then the effect will appear very 'large' with SDdiff as standardizer, where most people would nonetheless think of it intuitively as only a 'small effect'. Using SD1 as standardizer gives a more intuitive descriptive effect size because it tells us what the average change is in terms of the initial differences among people (i.e. does the treatment tend to 'move someone up' within the pre-treatment population a lot or only a little?)

So if the choice is being offered for descriptive results I would also favour having both SDdiff and SD1 as options for standardizer. It might even be instructive for people to see when the two are very different (e.g. 'small but consistent effect' or 'large but variable effect').

Or maybe that was obvious already and this discussion is only about inference, I'm not sure.

lindeloev commented 6 years ago

Exactly as Luke explained, for treatments, you often want to know how much you "moved" the group relative to the population from which they were sampled at baseline. This also allows for conversions to Number Needed to Treat etc.

It's not like it's difficult to do by hand :-) But I've seen several publications messing them up, reducing comparability between studies. So it would be nice if JASP could do both.

Best, Jonas

Jonas Kristoffer Lindeløv, M.Sc., Ph.D. Assistant Professor in Cognitive Neuroscience and Neuropsychology Profile at Aalborg University http://personprofil.aau.dk/117060, Scientific blog http://lindeloev.net/

On Tue, May 1, 2018 at 8:48 AM, Luke Tudge notifications@github.com wrote:

I think Cumming's point was that using SDdiff as standardizer is indeed appropriate for inference about the difference (for which we want to know the sampling distribution of the difference rather than of the mean in any one group), but that when viewed purely descriptively it can be misleading. Is that right?

If the treatment effect is extremely consistent across all participants but tiny relative to the pre-treatment variance then the effect will appear very 'large' with SDdiff as standardizer, where most people would nonetheless think of it intuitively as only a 'small effect'. Using SD1 as standardizer gives a more intuitive descriptive effect size because it tells us what the average change is in terms of the initial differences among people (i.e. does the treatment tend to 'move someone up' within the pre-treatment population a lot or only a little?)

So if the choice is being offered for descriptive results I would also favour having both SDdiff and SD1 as options for standardizer. It might even be instructive for people to see when the two are very different (e.g. 'small but consistent effect' or 'large but variable effect').

Or maybe that was obvious already and this discussion is only about inference, I'm not sure.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jasp-stats/jasp-desktop/issues/2498#issuecomment-385611085, or mute the thread https://github.com/notifications/unsubscribe-auth/ABX2uYRyjBjdhQtc2Ndzhtvc_WS1aTCQks5tuAVEgaJpZM4Tr-Hh .

fplatz commented 6 years ago

This discussion has a long tradition and it seems to me that there is no "general" approach. However, it might be interesting to have several options for the ES depending on your research question (see also Morris & DeShon, 2002).

Best, Friedrich

Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups design. Psychological Methods, 7(1), 105-125.

EJWagenmakers commented 6 years ago

OK, I can see how the SD1 approach is informative, descriptively. For inference I would still want to stick to SDdiff though. But it would be possible to add the SD1 method somewhere in the GUI. Any suggestions? E.J.

lindeloev commented 6 years ago

To me, it would be most intuitive to put it under "effect size" in the paired t-test widget:

Standardizer:

SD of difference [default!]
SD of column 1
SD of column 2

In the Bayesian widget, it would just be part of the basic widget since everything is standardized. Further down the line, it would be great to have the choice between report effects in standardized (current) or original units. Then the choice of standardizer could be subsumed under the former. Just mentioning this since other issues point to a want for more options concerning effect sizes (Hedge's g: #2270 and #2094, corrections for correlations: #1576).

@EJWagenmakers, If you had a particular reason in mind why you would not provide a CI when using SD1 as standardizer, I would be interested to learn!

EJWagenmakers commented 6 years ago

Well I guess I am willing to provide a confidence interval on effect size when SD1 is used as a standardizer (or a credible interval, possibly with a vague prior). But the test just seems to be on the treatment effect, and therefore involve SDdiff. And this produces the problem. If we just let users define effect size by means of SD1, and then conduct a test for the treatment effect using SD1, then this does not strike me as meaningful. So the challenge is to produce a point estimate and CI for the SD1 case, but without using it to do the test. This means it ought to be presented in the descriptives table, for instance. I'll discuss possible ways to do this with the team.

tomtomme commented 8 months ago

Still valid with 0.19 beta. The challenge: "Produce a point estimate and CI for the SD1 case [pre-test], but without using it to do the test. This means it ought to be presented in the descriptives table."

jasp-stats / jasp-issues

Choose standardizer for effect size in paired t-test #156