Closed R180 closed 5 years ago
The choice of which t test is used is most definitely NOT in the design of emmeans. The emmeans()
function and its relatives simply use the model that is fitted to the data, and use an appropriate test for that model. If the model is fitted using, say, lm()
with a univariate response vector, then that model is based on an assumprion of homogeneous error variances, and that dictates that a pooled t test is used. If on the other hand, the model is fitted using nlme::gls()
, and a specification was made in gls()
that the variances depend on treatments, then something akin to the Welch t tests will result in emmeans()
. Thus, in order for jamovi
to produce unequal-variances tests for post hoc comparisons, it is jamovi
's responsibility to allow for some model that estimates non-homogeneous-variances, and to pass that model to emmeans()
.
For repeated-measures situations, social scientists (at least historically) seem to concentrate on the choice between the "univariate" and "multivariate" methods, depending on some assessment of sphericity or compound symmetry. The former typically involves pooled t tests. The latter could be performed using lm()
but with a multivariate response (I think some part of the afex package uses that too), and that is supported by emmeans -- again provided that is the model that is used.
A somewhat separate matter is degrees of freedom. While the t statistic may come out the same as the Welch t statistic, the model object may or may not support the needed Satterthwaite degrees-of-freedom calculation. Most don't.
I will try to alert Jonathon Love to this response and see if I can somehow add him as a participant.
hey russell,
The choice of which t test is used is most definitely NOT in the design of emmeans. The
emmeans()
function and its relatives simply use the model that is fitted to the data, and use an appropriate test for that model.
haha, i think we might be arguing semantics. i feel like these two sentences contradict each other :) but i get what you're saying.
so @R180 has observed that emmeans when fed a 'normal' ANOVA produces t stats similar/same as a student's t-test, but when emmeans is fed an RM ANOVA, it produces t stats similar/same as a welch's t-test.
@R180 felt this wasn't quite correct/consistent, so i suggested he check in with you.
is this all working as expected? or does something here suggest that jamovi isn't doing something correctly? (we're both social scientists [unfortunately?], so you may find it easier to explain using that vernacular).
with thanks
Well, I don't really know what jamovi does, so don't understand the contradiction. If the user chooses the model, then the user must choose an appropriate one. If jamovi chooses it, then perhaps there needs to be clearer communication about the impact of those choices. But to @R180, the main point is that emmeans()
does not look at the data, it looks at the model.
Jonathon, FWIW, I am not a social scientist, and often have trouble understanding social scientists' needs. I have been known to lock horns with social scientists over issues like post hoc power calculations (I think they are a sham). I'm a retired statistics prof, and like physical science/engineering applications the best.
Well the thing is, I input data with wildly different variances that could not possibly fit the equal-variance ANOVA model, but still no Welch's test (I don't think). See below and attached.
I think that at a minimum, the output should indicate in a footnote what type of t test was used, and what model criterion value led to the use of that particular type. Example: " [image: Wildly different variances.png]
On Sun, Feb 17, 2019 at 4:13 PM Jonathon Love notifications@github.com wrote:
hey russell,
The choice of which t test is used is most definitely NOT in the design of emmeans. The emmeans() function and its relatives simply use the model that is fitted to the data, and use an appropriate test for that model.
haha, i think we might be arguing semantics. i feel like these two sentences contradict each other :) but i get what you're saying.
so @R180 https://github.com/R180 has observed that emmeans when fed a 'normal' ANOVA produces t stats similar/same as a student's t-test, but when emmeans is fed an RM ANOVA, it produces t stats similar/same as a welch's t-test.
@R180 https://github.com/R180 felt this wasn't quite correct/consistent, so i suggested he check in with you.
is this all working as expected? or does something here suggest that jamovi isn't doing something correctly? (we're both social scientists [unfortunately?], so you may find it easier to explain using that vernacular).
with thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rvlenth/emmeans/issues/90#issuecomment-464510085, or mute the thread https://github.com/notifications/unsubscribe-auth/APemyzUEEtkb1yUn1wTMSkGPjelTM1iAks5vOcYHgaJpZM4a_kop .
Honestly, I can't help with that. It's up to whatever process leads to developing the model. Again, emmeans() does not analyze the data. If you have wildly different variances, then it is up to you to choose an appropriate model.
OK. So I'll assume that somewhere in jamovi, there is a model parameter value in the ANOVA routine (i.e., specification of Student's t test) that's det differently in the Repeated-Measures ANOVA routine (i.e., specification of Welch's t. Thanks.
hi @R180,
there is a practice in the social sciences, that you perform an ANOVA, if it's significant, you do post-hoc t-tests (simply on the data) to determine where the difference(s) is/are. with this approach, i don't actually think it makes sense to do the ANOVA in the first place, seeings as you're using completely different models with the post-hoc t-tests. you should just do the t-tests.
we've been meaning to develop a pair-wise t-test module for jamovi for this other approach (which, we don't really think is a good approach, but hopefully can spur discussion).
the beauty of the emmeans approach is that the t-tests make use of the same model as the primary test.
OK. So I'll assume that somewhere in jamovi, there is a model parameter value in the ANOVA routine (i.e., specification of Student's t test) that's det differently in the Repeated-Measures ANOVA routine (i.e., specification of Welch's t.
no, i don't think this is true. i think the different t-values are simply emergent phenomena of the models.
i think if we set the RM Model up in such a way that it provided student's t-test type results, it would no longer be an RM ANOVA.
if you want to see the code, here we run the RM ANOVA:
https://github.com/jamovi/jmv/blob/5ed9f64d6ea903d1c9453a75847809d656524116/R/anovarm.b.R#L43
and here we call emmeans on the model:
https://github.com/jamovi/jmv/blob/5ed9f64d6ea903d1c9453a75847809d656524116/R/anovarm.b.R#L539
the t-values are determined by the model, not by a decision somewhere to use one type of t-test over another.
Ok. I think that such behavior is problematic. We"very already established, I think, that the choice of the type of t test is wholly inirely dependent on what model jamovu and/or javovi choose to attempt to fit to the data, and wholly independent of the specific characteristics if the data. So for the package to be credible, I think there needs to be clear documentation which model parameters govern the selection of the type if t test.
On Sun, Feb 17, 2019, 6:21 PM Jonathon Love <notifications@github.com wrote:
OK. So I'll assume that somewhere in jamovi, there is a model parameter value in the ANOVA routine (i.e., specification of Student's t test) that's det differently in the Repeated-Measures ANOVA routine (i.e., specification of Welch's t.
no, i don't think this is true. i think the different t-values are simply emergent phenomena of the models.
i think if we set the RM Model up is such a way that it provided student's t-test type results, it would no longer be an RM ANOVA.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rvlenth/emmeans/issues/90#issuecomment-464525293, or mute the thread https://github.com/notifications/unsubscribe-auth/APemywapvMGKdFgAWqXHeS8XX0mLcEAAks5vOePggaJpZM4a_kop .
Or to but it slightly differently, the documentation needs to present a convincing rationale, and explanation, for the model's behavior. The model can be capable of selections between Student's t only if it had been given the freedom to make that choice. Do there needs to be a stated rational for granting the model that freedom. Without such a rationale, the scientust cannot defend the model's choices, and so cannot defend his it her use if the model.
On Sun, Feb 17, 2019, 6:21 PM Jonathon Love <notifications@github.com wrote:
OK. So I'll assume that somewhere in jamovi, there is a model parameter value in the ANOVA routine (i.e., specification of Student's t test) that's det differently in the Repeated-Measures ANOVA routine (i.e., specification of Welch's t.
no, i don't think this is true. i think the different t-values are simply emergent phenomena of the models.
i think if we set the RM Model up is such a way that it provided student's t-test type results, it would no longer be an RM ANOVA.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rvlenth/emmeans/issues/90#issuecomment-464525293, or mute the thread https://github.com/notifications/unsubscribe-auth/APemywapvMGKdFgAWqXHeS8XX0mLcEAAks5vOePggaJpZM4a_kop .
Hmmmm. Well, there is a practice in statistics, whereby one fits a model, then -- before doing any testing at all, much less, post hoc comparisons -- one makes some serious efforts to determine whether the model reasonably fits the data. This process involves a lot of residual plots and such, and also perhaps some tests of particular model assumptions, such as some kind of test of variance homogeneity like Levene's test, or (in the case or repeated measures) sphericity. If one just forges ahead with a bunch of statistical tests that may or may not be appropriate, that's reckless. [Insert some random analogy with recent proceedings involving the US Government.]
Statistics is serious and difficult business, and a recipe approach is as sure to fail as using a sea-level recipe for chocolate cake when high in the Andes. Any reasonable user interface would ensure that users have the opportunity to assess the success of the fitted model in explaining the data and in meeting its underlying assumptions.
Additionally, while I do not personally use the afex package, I am pretty sure its aov model setups do try to perform some of these steps in their default output, and the returned object does include both the "univariate" and "multivariate" models so that the user may choose which one to use in post hoc analyses. To me, that is still too routinized, but it's better than plundering ahead without ever questioning the underlying model. In the case of jamovi, it is a serious design failure if the user does not have the opportunity to choose the model, or at least to be informed as to which one was chosen automatically.
the choice of the type of t test is wholly entirely dependent on what model jamovi chose to attempt to fit to the data
yes. but it's worth noting that we just fit standard models. we haven't made any decisions here.
I think there needs to be clear documentation which model parameters govern the selection of the type if t test.
so i suppose this is a question for @rvlenth. is there a ruleset a curious person could use to determine what type of t-test emmeans
will use where russell? so there's the suggestion that with an RM ANOVA model emmeans
uses a welch's t-test (i don't know if it actually does or not, i assumed it just looked like one, but let's say it is welch's t-test – at the risk of sounding/being ignorant, it's not immediately obvious to me from the model, why a welch's would be used there [but i'm certainly not suggesting it's incorrect]).
i kinda imagined the choice of t-test (and corrections?) was all pretty complicated - and i've been happy to assume emmeans
is doing the right thing.
with thanks
one makes some serious efforts to determine whether the model reasonably fits the data
sorry, we do provide all of that, i just took that being performed for granted.
You can't take it for granted if the post hoc results are displayed before the user has a chance to make such determinations.
You can't take it for granted if the post hoc results are displayed before the user has a chance to make such determinations.
yes, that's true. which is why we don't display the post hoc results straight away. but i feel like this is a bit of a tangent.
(or are you talking about the pairwise t-test module? in which case, we're agreed that's a bad idea).
Well in fact I don't know what I'm talking about, as I haven't used jamovi. So I should stay out of this and let you and Richard figure this out.
but returning to my earlier question, if i understand correctly, richard wants to better understand the choices emmeans
makes wrt t-tests.
is there a ruleset a curious person could use to determine what type of t-test emmeans
will use? so there's the suggestion that with an RM ANOVA model emmeans
uses a welch's t-test (i don't know if it actually does or not, i assumed it just looked like one, but let's say it is welch's t-test – at the risk of sounding/being ignorant, it's not immediately obvious to me from the model, why a welch's would be used there [but i'm certainly not suggesting it's incorrect]).
i kinda imagined the choice of t-test (and corrections?) was all pretty complicated - and i've been happy to assume emmeans
is doing the right thing.
with thanks
I will say it yet again: EMMEANS() DOES NOT MAKE A CHOICE REGARDING WHICH T TEST TO USE. YOU ARE ABSOLUTELY INCORRECT TO ASSUME IT IS CHOOSING WHICH ONE TO USE.
It uses the one model that it is given, and it computes t tests based on that model. If you think it is receiving more than one model and is being asked to choose between them, you are mistaken and you need to recode jamovi accordingly. If it's an object from afex, I think Henrik's driver (not mine) for those models defaults to the univariate model and there is an optional argument for specifying the multivariate one. But you (or your user) has to specify that option.
It uses the one model that it is given, and it computes t tests based on that model
sorry, i think we're muddled by semantics. the word 'choice' or 'choose' might be muddling us.
i mean these two statements to be equivalent:
"the package runs a t-test based on the model" or "the package chooses a t-test based on the model"
but either way, we're interested in what properties of the model, lead to different t-tests being performed.
for example, there's the suggestion that an RM ANOVA leads to a welch's t-test being performed - i'm trying to understand why - because it doesn't seem obvious (or to determine that its not actually a welch's t-test).
it sounds like we might need to ask henrik next ... but i can imagine him saying "t-tests? that's all emmeans, you'd have to ask russell"
anyhow, i hope you're not too exasperated by all of this. i do try very hard to communicate clearly, but there are the odd spectacular fail.
Yes, I am completely exasperated, because you are assuming that my software does things that it simply does not do.
This is not a matter of semantics. I do not think you are understanding what I am trying to tell you -- e.g., what is meant by a statistical model. One model leads to one form of t test. If you want another form of t test, you need a different model.
emmeans()
does not make any choices; it does not choose; it does not make decisions about the best way to analyze data. It takes one model object and computes stuff based on the model formula, its regression coefficients, and the estimated covariance matrix of of those regression coefficients. All that stuff is based on a set of assumptions defined in the model and hence implicitly chosen by the user when the model was fitted, -- whether the error variance is homogeneous; whether things are correlated and how; etc. All that is in place before emmeans()
comes into the picture. Given those coefficients, model formula, and covariance matrix, we decide on a bunch of linear functions of those regression coefficients, obtain estimates of those linear functions, and estimate their standard errors. Those results lead to t statistics defined as the estimates divided by their standard errors. Period. There is no step anywhere along the line where some other set of t statistics is computed based on another set of assumptions. There is only one model and only one set of assumptions upon which that model is based.
There is nothing anywhere in the documentation of the emmeans package that says that different assumptions are examined and used to decide on an appropriate form of a t statistic. I don't know where you got that idea.
Perhaps I can ask a simpler question. Do we know what formula emmeans uses to calculate the degrees of freedom for a given post-hoc t test? Or is that formula computed by some other package on which emmeans depends?
-- Richard Anderson
It uses a formula for depending on the model class. If it is lm
, it uses $df.residual
. If it is lmerMod
, it uses a sophisticated algorithm in the pbkrtest package. If it is aovList
, it uses Satterthwaite. And so forth. If you like, you can examine the code by listing the corresponding emm_basis
method, e.g.,
emmeans:::emm_basis.lm
emmeans:::emm_basis.lmerMod
emmeans:::emm_basis.aovList
### etc
And I remind you that the model class is determined by the R code you used to fit the model. Fior example
warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
class(warp.lm)
This is just another way of saying that emmeans()
uses the model to get all its information -- including how to compute d.f.
I'll be more specific with that example...
> warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
> emmeans(warp.lm, pairwise ~ tension | wool)
$emmeans
wool = A:
tension emmean SE df lower.CL upper.CL
L 44.6 3.65 48 37.2 51.9
M 24.0 3.65 48 16.7 31.3
H 24.6 3.65 48 17.2 31.9
wool = B:
tension emmean SE df lower.CL upper.CL
L 28.2 3.65 48 20.9 35.6
M 28.8 3.65 48 21.4 36.1
H 18.8 3.65 48 11.4 26.1
Confidence level used: 0.95
$contrasts
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.556 5.16 48 3.986 0.0007
L - H 20.000 5.16 48 3.878 0.0009
M - H -0.556 5.16 48 -0.108 0.9936
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.556 5.16 48 -0.108 0.9936
L - H 9.444 5.16 48 1.831 0.1704
M - H 10.000 5.16 48 1.939 0.1389
P value adjustment: tukey method for comparing a family of 3 estimates
> library(nlme)
> warp.gls = gls(breaks ~ wool * tension, data = warpbreaks,
weights = varIdent(form = ~ 1 | wool*tension))
> emmeans(warp.gls, pairwise ~ tension | wool)
$emmeans
wool = A:
tension emmean SE df lower.CL upper.CL
L 44.6 6.03 48 32.4 56.7
M 24.0 2.89 48 18.2 29.8
H 24.6 3.42 48 17.7 31.4
wool = B:
tension emmean SE df lower.CL upper.CL
L 28.2 3.29 48 21.6 34.8
M 28.8 3.14 48 22.5 35.1
H 18.8 1.63 48 15.5 22.1
Confidence level used: 0.95
$contrasts
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.556 6.69 48 3.074 0.0096
L - H 20.000 6.94 48 2.883 0.0159
M - H -0.556 4.48 48 -0.124 0.9916
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.556 4.55 48 -0.122 0.9918
L - H 9.444 3.67 48 2.574 0.0346
M - H 10.000 3.54 48 2.824 0.0186
P value adjustment: tukey method for comparing a family of 3 estimates
These are essentially the Welch tests; however, gls doesn't use the Satterthwaite method to get the d.f.
At this point, I suggest you not use jamovi to do your analyses, as it apparently does not currently support the kind of models you need.
Thank you. This is actually quite illuminating and goes a long way toward addressing my initial question.
You indicated "These are essentially the Welch tests; however, gls doesn't use the Satterthwaite method to get the d.f. Satterthwaite, which I believe is another name for Welch's, does, typically use a unique method to calculate degrees of freedom--a method that can produce fractional rather than integer-value degrees of freedom, an that can produce d.f. values that are responsive to the variances in the cells being tested. So instead of having d.f. = 48 for each contrast, d.f. could vary across the contrasts. I have been looking at output and assuming that the post-hoc tests Student's t tests because the d.f. did not reflect the Welch-Satterwaithe method (i.e, were not responsive to variance inequality across cells and were never fractional.
So the ANOVA does not use the Satterthwaite method to get the d.f. However, repeated-measures ANOVA does use the Satterthwaite method to get the d.f. And so repeated measures post hocs are responsive to unequal cell variances, and are sometimes fractional, and so does sometimes produce a different d.f. value despite n being constant.
I'm starting to think that these inconsistencies (e.g., the fact that the gls routine doesn't use the Satterthwaite d.f., while some other routine, perhaps written by someone else, for repeated-measures post-hocs does use Satterthwaite d.f.) may be part of what one accepts when one uses R packages. Each package author may make choices that aren't wrong, but that also may not be consistent with choices made by other authors. And when those various packages are brought together inside a an overarching package, the inconsistencies become more noticeable. Is this what's probably happening? Or am I off base here? @rvlenth @jonathon-love
There is indeed a lot of diversity in how R packages are implemented. Some are more carefully written than others. Some reflect the biases of their developers and may pass over certain areas that are important to others. Many serve a very general class of situations, so that specific methods such as Welch's t test become just a special case, for which details such as degrees-of-freedom methods kind of go by the wayside.
This is actually true of any open-source product that accepts contributed code. Much of what you get is on the bleeding edge of new developments -- but it can be your blood!
PS but don't lose sight of what's important. The d.f. are not as important as having t ratios that are correctly made. Once the d.f. exceed 15 or so, it hardly makes a difference; but the wrong t value makes a big difference.
FWIW, I have added to the gls
support a crude Satterthwaite method:
> emmeans(warp.gls, pairwise ~ tension | wool, mode = "sat")
$emmeans
wool = A:
tension emmean SE df lower.CL upper.CL
L 44.6 6.03 7.97 30.6 58.5
M 24.0 2.89 7.93 17.3 30.7
H 24.6 3.42 7.86 16.6 32.5
wool = B:
tension emmean SE df lower.CL upper.CL
L 28.2 3.29 8.05 20.7 35.8
M 28.8 3.14 7.99 21.5 36.0
H 18.8 1.63 8.23 15.0 22.5
d.f. method: satterthwaite (via simulation)
Confidence level used: 0.95
$contrasts
wool = A:
contrast estimate SE df t.ratio p.value
L - M 20.556 6.69 11.4 3.074 0.0255
L - H 20.000 6.94 12.5 2.883 0.0331
M - H -0.556 4.48 15.3 -0.124 0.9916
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.556 4.55 15.9 -0.122 0.9918
L - H 9.444 3.67 11.8 2.574 0.0594
M - H 10.000 3.54 12.0 2.824 0.0379
P value adjustment: tukey method for comparing a family of 3 estimates
In general, Satterthwaite d.f. requires finding the variance of the variance estimates. I do this by simulation, perturbing the response values and obtaining a gradient in the estimated covariance matrix. So the result varies a bit from one call to another. In the above, each group has 9 observations, so each mean should have 8 d.f., and the results shown are pretty close. For the differences, we can compare with what we get by other means; for example, comparing (wool AS, tension L) with (wool A, tension M), we have
> with(warpbreaks, t.test(breaks[1:9], breaks[10:18]))
Welch Two Sample t-test
data: breaks[1:9] and breaks[10:18]
t = 3.0736, df = 11.481, p-value = 0.01011
and the output shows 11.4 d.f. (the P values differ because of the (approximate) Tukey adjustment).
This additional feature will be in the next emmeans update. Note that it applies only to gls
, not to all models.
ah! russell this is the answer is was fishing for! i just didn't know how to ask it:
Given those coefficients, model formula, and covariance matrix, we decide on a bunch of linear functions of those regression coefficients, obtain estimates of those linear functions, and estimate their standard errors. Those results lead to t statistics defined as the estimates divided by their standard errors.
so returning to richard's remark:
I'm starting to think that these inconsistencies (e.g., the fact that the gls routine doesn't use the Satterthwaite d.f., while some other routine, perhaps written by someone else, for repeated-measures post-hocs does use Satterthwaite d.f.) may be part of what one accepts when one uses R packages.
but shouldn't the t stats for a model, i.e. len ~ dose + supp
, be identical irrespective of what R package was used? and if they are different, it should be possible to pin-point where a package isn't doing the correct thing (i.e. gls
doesn't implement Satterwaithe for example)?
it's been an assumption of mine that with emmeans
that for any given model formula, there's one correct set of post-hocs – irrespective of what R package was used to construct the model object.
am i correct on this russell? (or does each package construct it's own post-hoc strategy, and puts that in the model object?)
with thanks
I give up. I wrote the code and I know what it does and doesn’t do.
Jonathon, I very much appreciate what you have done to help me understand and rein-in some package-dependencies issues. But in this thread, you have fished for a misinterpretation of one verb in my latest extended answer. I have run out of ways to explain it. So I think I have to just let you continue to believe what you insist on believing, regardless of my protests. It is too bad but there is no other option.
Perhaps you have a statistics department (as opposed to a social science department) near you with a colleague who can understand the way I think, and who can sit down with you and explain what I’ve been trying to say. A model is more than its formula; it is also the underlying assumptions upon which it is based.
BTW, in social-science terms, what we are seeing here is a blazing example of confirmation bias.
Sorry for adding something more here after it is closed. But as the author of afex
, which is the package that I believe Jamovi
still uses in the backend for ANOVAs, I feel I can try to untangle some of the discussion here.
The question as I understand it is: What are the degrees of freedoms for repeated-measures models based on the aov
model. Because currently, afex
still passes the aov
model per default to emmeans
in case there are repeated measures factors. And it is also not fully clear to me, how the degrees of freedom are obtained in that case. I can currently only say that they are based on the aov
model and they estimate different variances as shown below.
Let us use the example data set described in more details here. It has both, between-subjects and within-subjects factors.
library("afex")
library("emmeans")
data(sk2011.1)
a1 <- aov_ez("id", "response", sk2011.1, between = "instruction",
within = c("inference", "plausibility"))
For the between-subjects factors, the df are easily to understand, N - k:
emmeans(a1, "instruction")
# NOTE: Results may be misleading due to involvement in interactions
# instruction emmean SE df lower.CL upper.CL
# deductive 77.7 3.56 38 70.5 85.0
# probabilistic 80.5 3.56 38 73.3 87.7
#
# Results are averaged over the levels of: inference, plausibility
# Confidence level used: 0.95
length(unique(sk2011.1$id))
# [1] 40
For repeated-measures factors, the df are not that clear. For example:
emmeans(a1, "inference")
# NOTE: Results may be misleading due to involvement in interactions
# inference emmean SE df lower.CL upper.CL
# MP 87.5 3.78 127 80.0 95.0
# MT 76.7 3.78 127 69.2 84.2
# AC 69.4 3.78 127 61.9 76.9
# DA 83.0 3.78 127 75.5 90.4
#
# Results are averaged over the levels of: instruction, plausibility
# Confidence level used: 0.95
For the other factor:
emmeans(a1, "plausibility")
# NOTE: Results may be misleading due to involvement in interactions
# plausibility emmean SE df lower.CL upper.CL
# plausible 86.2 2.79 54.7 80.6 91.8
# implausible 72.1 2.79 54.7 66.5 77.7
#
# Results are averaged over the levels of: instruction, inference
# Confidence level used: 0.95
As a reminder, they are based on the aov
part of the model object which shows the following summary:
summary(a1$aov)
# Error: id
# Df Sum Sq Mean Sq F value Pr(>F)
# instruction 1 622 621.6 0.307 0.583
# Residuals 38 77042 2027.4
#
# Error: id:inference
# Df Sum Sq Mean Sq F value Pr(>F)
# inference 3 14827 4942 5.809 0.000990 ***
# instruction:inference 3 15308 5103 5.998 0.000785 ***
# Residuals 114 96988 851
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Error: id:plausibility
# Df Sum Sq Mean Sq F value Pr(>F)
# plausibility 1 16046 16046 34.23 9.13e-07 ***
# instruction:plausibility 1 5001 5001 10.67 0.00232 **
# Residuals 38 17815 469
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Error: id:inference:plausibility
# Df Sum Sq Mean Sq F value Pr(>F)
# inference:plausibility 3 2096 698.6 2.867 0.03970 *
# instruction:inference:plausibility 3 2911 970.3 3.982 0.00971 **
# Residuals 114 27779 243.7
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This shows that this model estimates individual variances per error strata. How exactly the df are calculated from this, is not something I fully understand (even after looking at the source code. But maybe you can give a high-level overview of this, Russ?
One final word, I agree that it is somewhat unfortunate that afex
uses the aov
model here. But when I initially programmed it, that was the only way to use emmeans
for repeated measures ANOVA. Since some time, we can now also use the multivariate model for that (the lm
or better mlm
model). This can be changed globally via option: afex_options("emmeans_model")
The default is "univariate"
, but "multivariate"
is now also supported. Note we can also set the model directly in the call to emmeans
as well. If we do so, we get the 'familiar' df.
emmeans(a1, "inference", model = "multivariate")
# inference emmean SE df lower.CL upper.CL
# MP 87.5 1.80 38 83.9 91.2
# MT 76.7 4.06 38 68.5 84.9
# AC 69.4 4.77 38 59.8 79.1
# DA 83.0 3.84 38 75.2 90.7
#
# Results are averaged over the levels of: instruction, plausibility
# Confidence level used: 0.95
I plan to change the default in the future from the aov
model to the lm
model. But when is the right time for such a change that breaks backward-compatibility? Probably not in my first year at a new faculty job.
sorry russell,
this has been hard for me too. everytime you've characterised what i think, my thoughts are "that isn't what i think", and every time you describe what is the case, my thoughts are "that is (more or less) what i think".
i feel like you've seized on the word 'choose' as having a very specific meaning, when i mean it in a very general sense ... and i was hoping by pointing out that you'd used the word 'decide' that you would see we're actually on the same page here.
you seem bewildered that i can think such foolish things, and i'm a bit bewildered that you can think that i think such foolish things. so it seems like the best move is to leave this discussion.
anyhow, i have appreciated you taking the time, even when you were feeling frustrated.
jonathon
Henrik's remarks bring up a whole different dimension to this whole thing, having to do with mixed models and a different manifestation of the Satterthwaite df that have to do with combining errors from different sources (error strata, where at least in aov
objects, the errors within each stratum are assumed to have constant variance) as opposed to what happens in the Welch procedure where there is only one source of error but its variance isn't homogeneous. To try to respond to all that, I'd have to write out notes for a semester's course in linear models, and in doing so, I'd dig an even bigger hole for myself.
The challenge faced by a software developer is that, by making sophisticated methods available, there is an implicit assumption that the user understands the methods being offered and is responsible for ensuring that s/he is using it appropriately. But of course, there will always be users who don't understand what is appropriate and consequently will misuse the software -- perhaps resulting in an analysis that is markedly worse than some simpler analysis they could have done more or less correctly. Then there is the challenge of simplified user interfaces aimed at less experienced users, where we are also trusting that the design of the interface keeps users out of trouble and that the user interface correctly represents what is being offered -- but may not always fulfill that need either. And, let's admit it, people like me are part of the problem too. Many professional statisticians are a lot more interested in math than in data and scientific reasoning, and sometimes lead people down a very misleading path. So you can't trust us either. I could claim myself as an exception, but I'd be wrong in doing so because I have my own set of blind spots, and I am blind to my own blind spots.
I guess what maybe I'll try to do is add something to the FAQs vignette and perhaps another illustrative example somewhere else. (Or perhaps a whole new vignette on repeated measures? No, that'd just be asking for trouble.) But I'm not going to comment further here.
OK, I'll say one more thing. In response to Jonathon's question:
... does each package construct it's own post-hoc strategy, and puts that in the model object?
YES. Each package potentially implies a different post hoc strategy. You choose the package and modeling options that do the best job of fitting the data. Then you give that result to emmeans()
and it will do appropriate post hoc tests.
thanks russell,
i see that's where i was incorrect. i'd assumed that for a given model formula, irrespective of package used to construct the model, i'd get the same post hocs. i see now the situation is more complicated than that.
with thanks
Greetings,
I recently had an exchange with Jonathan Love (developer of jamovi statistical software) about some behavior of the emmeans package, which jamovi makes use of. I submitted a request to Jonathan that jamovi enforce some consistency regarding the use of Student’s t versus some alternative-to-Student's (Welch’s?) t tests in post-hoc comparisons. Through trial and error, I've noticed that in a non-repeated-measures ANOVA, the degrees of freedom for the post-hoc tests do not change no matter how unequal the within-cell variances are. So I assume they are based on Student's t. But for repeated-measures ANOVA, the degrees of freedom for each post-hoc test does depend on the degree of inequality of variances. So I presume these are Welch's (or some other non-Student's) t test. Jonathan said that this difference in the type of t test employed is caused by the design of the emmeans package (not jamovi) and so the issue should be directed to the developer(s) of the emmeans package. So my question is, what t-test types does emmeans employ, why is there an apparent inconsistency in the type of post-hoc t test employed, and is there a way to fix the inconsistency?