Closed qdread closed 1 year ago
It took me a while to understand why one would have a d.f. option connected specifically with adjustments to P values. Then I realized that if you have an adjustment like, say, Tukey, then we are talking about the distribution of the maximum of a Studentized range and the Studentization is technically based on one SD estimate, so there would be an argument for specifying just one value for df. The built-in pstudent()
function in R does not enforce this requirement in the way it vectorizes the function, as we can see by comparing these results:
> ptukey(1:5, 4, df = 3:7) # like adjdfe = row
[1] 0.1113336 0.4471334 0.7363707 0.8943063 0.9625459
> ptukey(1:5, 4, df = 5) # like adjdfe = source
[1] 0.1097220 0.4575163 0.7363707 0.8779368 0.9416749
... where only the 3rd values match.
Such a justification for using a single df value does not exist for the Sidak adjustment (nor Bonferroni, nor any of the p.adjust.methods
, because these adjustments are based on the vector of unadjusted P values; and it seems wrong to me to distort those unadjusted P values when the df are not all the same. So I think SAS has the wrong default.
I am presently not inclined to implement an adjdfe
option. My reasons are
summary
includes a df
column that pretty strongly suggests that each P value as based in part on the d.f. shown.I don't have very easy access to SAS anymore, but I think maybe the results SAS would produce with its default adjdfe = source
are
> pairs(emm, adj = "sidak", df = 9)
calcium = C0:
contrast estimate SE df t.ratio p.value
F0 - F1 0.258 0.255 9 1.014 0.9150
F0 - F2 -0.280 0.255 9 -1.099 0.8825
F0 - F3 -0.516 0.255 9 -2.025 0.3675
F1 - F2 -0.538 0.255 9 -2.113 0.3263
F1 - F3 -0.774 0.255 9 -3.039 0.0813
F2 - F3 -0.236 0.255 9 -0.926 0.9425
calcium = C1:
contrast estimate SE df t.ratio p.value
F0 - F1 -0.148 0.255 9 -0.582 0.9941
F0 - F2 -0.861 0.255 9 -3.380 0.0478
F0 - F3 -0.398 0.255 9 -1.564 0.6289
F1 - F2 -0.713 0.255 9 -2.797 0.1185
F1 - F3 -0.250 0.255 9 -0.981 0.9260
F2 - F3 0.463 0.255 9 1.816 0.4784
Results are averaged over the levels of: soil
Degrees-of-freedom method: user-specified
P value adjustment: sidak method for 6 tests
Thank you for your insightful response!
As it turns out, SAS's default adjdfe = source
uses 54 as the denominator degrees of freedom. When I ran pairs(emm, adj='sidak', df=54)
I got the same adjusted p-values as SAS produces. That is the denominator df from the Type III F-test for fertilizer calcium. But I agree with your intuition that 9 would be the more appropriate denominator df to use because we are making a pairwise comparison of the fertilizer means within each level of calcium, not comparing all levels of fertilizer calcium. So, maybe SAS is using an especially inappropriate default df for comparisons done within the slice
statement.
I also agree with your arguments for why it isn't a good idea to implement the adjdfe
option. I don't think it is necessary to perfectly reproduce the output of other statistical packages. I mainly asked this question out of curiosity about the different default behavior. I really appreciate your help.
Well, if SAS is using the df for fertilizer calcium, then it is dead wrong, because that is the d.f. for the family of interaction contrasts for that term. Interaction contrasts have coefficients that sum to zero and whose marginal sums also sum to zero. In this example, the pairwise comparisons of fertilizer calcium combinations (which are not interaction contrasts, they are contrasts of cell means) will have 54 d.f. only if they are on the same fertilizer, and 11.9 df otherwise. (Check it out: pairs(emm, by = NULL)
)
Thanks, that makes a lot of sense. I might draw the attention of the other USDA statisticians to this. Many of them are dyed in the wool SAS users! :-D
OK, but just to be clear, that is a common mistake, and SAS is a solid, reliable product that does a whole lot of stuff quite well.
Russ
Closing this issue as resolved
Hi Russ, thanks as usual for the great package and how well you support and maintain it.
I am working on a proof of concept for a stats lesson where I demonstrate how to use emmeans. The lesson is targeted at SAS users, so I had the idea of trying to generate identical results in SAS and R.
Initially I thought there was a discrepancy in the way the Sidak adjustment is done in
emmeans::contrast()
as opposed to thelsmeans
orslice
statement ofproc glimmix
. I was finally able to replicate the behavior ofemmeans::contrast()
when I set the optionadjdfe=row
inlsmeans
statement ofproc glimmix
. This led me to read in the SAS documentation that the default forlsmeans
andslice
statements isadjdfe=source
, viz.I was curious if there is some way to replicate the
adjdfe=source
behavior in the emmeans package, or if you have any interesting insight or recommendation as to why you have a different default thanglimmix
.reproducible emmeans example
corresponding SAS code
Below, the first
slice
statement reproduces the p-values ofcontrast()
but the second doesn't.