rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

Contrasts across higher levels of nested variables #186

Closed p-schaefer closed 4 years ago

p-schaefer commented 4 years ago

I'm running into an issue trying to do contrasts across higher (main effect) levels of nested terms in a two-way nested model:

library(emmeans)
set.seed(1234)

tdf<-data.frame(site=rep(c("s1","s2","s3","s4"),each=5*4),
                year=as.character(rep(rep(1:4,each=5),4)),
                status=c(rep("control",20),rep("impact",60)),
                period=rep(rep(c("before","after"),each=10),4),
                val=c(rnorm(80))
                )

tmod<-lm(val~(status/site)*(period/year),data=tdf)

emm.full1<-emmeans(tmod, ~(period/year)*(status/site), nesting=c("year %in% period","site %in% status"))

# I would like to get contrasts for both higher and lower levels of the two nested terms
# by the other factor, such as:

emmeans(emm.full1, pairwise~site|period)
emmeans(emm.full1, pairwise~period|site)

# or

emmeans(emm.full1, pairwise~status|year)
emmeans(emm.full1, pairwise~year|status)

# or

emmeans(emm.full1, pairwise~status|period)
emmeans(emm.full1, pairwise~period|status)

I need to do this for a number of response variables, and won't be examining every contrast for each endpoint, but I seem to be running into trouble regardless.

This is the error I get, and while the message is clear enough, I don't fully understand why I can't perform these contrasts:

Error in .nested_contrast(rgobj = object, method = method, by = by, adjust = adjust,  : 
  There are no factor levels left to contrast. Try taking nested factors out of 'by'.

Any help would be greatly appreciated.

rvlenth commented 4 years ago

First, I think your emm.full1 is just the same as the reference grid, so I opted to just work with that:

> rg = ref_grid(tmod, nesting = c("year %in% period","site %in% status"))
> rg
'emmGrid' object with variables:
    status = control, impact
    site = s1, s2, s3, s4
    period = after, before
    year = 1, 2, 3, 4
Nesting structure:  year %in% period, site %in% status

Here is one of my tests without the pairwise ~ part (because the problem happens in getting the EMMs, not in the comparisons):

> emmeans(rg, ~ period | site)
NOTE: Results may be misleading due to involvement in interactions
Results are averaged over the levels of: year 
Confidence level used: 0.95 

Note that this appears to show no EMMs. There is something strange going on here under the hood. It works out more appropriately if you specify the nesting factor(s) as well among the by variables:

> emmeans(rg, ~ period | status / site)
NOTE: Results may be misleading due to involvement in interactions
status = control:
 site period  emmean    SE df lower.CL  upper.CL
 s1   after  -0.1182 0.305 64 -0.72784  0.491501
 s1   before -0.3832 0.305 64 -0.99283  0.226514

status = impact:
 site period  emmean    SE df lower.CL  upper.CL
 s2   after  -0.7662 0.305 64 -1.37586 -0.156521
 s3   after  -0.2789 0.305 64 -0.88854  0.330807
 s4   after  -0.0423 0.305 64 -0.65197  0.567370
 s2   before -0.3879 0.305 64 -0.99762  0.221725
 s3   before -0.6098 0.305 64 -1.21947 -0.000125
 s4   before  0.6166 0.305 64  0.00692  1.226264

Results are averaged over the levels of: year 
Confidence level used: 0.95 

This still isn't right because we should have 4 different by groups, not two.

I will look at this and see if it can be made to work right. I think the issue is that in coding all this, I was not thinking of the possibility that by variables could be nested.

Thanks for the report. I'll keep in touch.

rvlenth commented 4 years ago

OK. I think I have navigated all the issues here. There were three:

  1. The nesting structure in this example should have been correctly auto-detected
  2. We didn't always get the by variables right in what is returned by emmeans()
  3. A sanity check in contrast() was misplaced

So here's an illustration (with your data and model) where we get it all right:

> rg = ref_grid(tmod)
NOTE: A nesting structure was detected in the fitted model:
    site %in% status, year %in% period
> # Note we auto-detected the nesting correctly -- manual spec not needed!

> emmeans(rg, pairwise ~ period | site)
NOTE: Results may be misleading due to involvement in interactions
Note: Grouping factor(s) for 'site' have been added to the 'by' list.

$emmeans
status = control, site = s1:
 period  emmean    SE df lower.CL  upper.CL
 after  -0.1182 0.305 64 -0.72784  0.491501
 before -0.3832 0.305 64 -0.99283  0.226514

status = impact, site = s2:
 period  emmean    SE df lower.CL  upper.CL
 after  -0.7662 0.305 64 -1.37586 -0.156521
 before -0.3879 0.305 64 -0.99762  0.221725

status = impact, site = s3:
 period  emmean    SE df lower.CL  upper.CL
 after  -0.2789 0.305 64 -0.88854  0.330807
 before -0.6098 0.305 64 -1.21947 -0.000125

status = impact, site = s4:
 period  emmean    SE df lower.CL  upper.CL
 after  -0.0423 0.305 64 -0.65197  0.567370
 before  0.6166 0.305 64  0.00692  1.226264

Results are averaged over the levels of: year 
Confidence level used: 0.95 

$contrasts
site = s1, status = control:
 contrast       estimate    SE df t.ratio p.value
 after - before    0.265 0.432 64  0.614  0.8679 

site = s2, status = impact:
 contrast       estimate    SE df t.ratio p.value
 after - before   -0.378 0.432 64 -0.876  0.7276 

site = s3, status = impact:
 contrast       estimate    SE df t.ratio p.value
 after - before    0.331 0.432 64  0.767  0.7904 

site = s4, status = impact:
 contrast       estimate    SE df t.ratio p.value
 after - before   -0.659 0.432 64 -1.527  0.3405 

Results are averaged over some or all of the levels of: year 

I also tested this with ~ year | site, ~ period | status, and ~ year | status, all obtaining correct results.

Thanks again for reporting this. I think I have resolved all these issues so will close, but of course you may comment at any time.

abmahesh commented 1 year ago

This is helpful. Is there a way to replace 'pairwise' contrast with a custom contrast (say if you had 4 levels)?

rvlenth commented 1 year ago

Yes. Skip the left-hand side in the pairwise ~ ... specification and save the object (as, say, EMM). Then use the contrast() function. Please see the documentation for contrast and also look at ? pairwise.emmc where you will see not just that, but all the built-in contrast methods, which may already include the custom contrasts you want.

abmahesh commented 1 year ago

Thanks. I have looked at the documentation but not able to get what I am looking for.

I have a lmer model: lmod <- lmer( Y ~ treatment :category + covariates..) for each of the categories, I have 4 levels: precontrol, pretest, postcontrol, post-test; I am looking to calculate the contrast: post-test - postcontrol -pretest +precontrol

I tried the following: rg = ref_grid(lmod) custom_contrast = matrix(c(-1, 1, 1, -1), ncol = 4, nrow =1, byrow = TRUE) emm_obj <- emmeans(rg, contrast = custom_contrast, specs = "treatment", by = "category") contrasts = contrast(emm_obj) summary(contrasts)

This runs ...but not exactly what I want.. Still working on it...any help/guidance is appreciated.

Thanks!

rvlenth commented 1 year ago

I really don't think my documentation is that unclear. Plus there are examples.

emm_obj <- emmeans(rg, contrast = custom_contrast, specs = "treatment", by = "category")
contrast(emm_obj, list(my.contrast = c(-1, 1, 1, -1)))
abmahesh commented 1 year ago

When I run the above code, I get the error: Non-character contrast methods are not supported with nested objects

Apologies Professor. I am new to this. Are you referring to this documentation? https://cran.r-project.org/web/packages/emmeans/vignettes/comparisons.html#linfcns

rvlenth commented 1 year ago

I was just talking about the regular examples at the end of each help page. But the vignettes help too.

But I had forgotten about the nested aspect of this, and you're right, you have to use a named contrast set with those, because the number of levels can vary from one nest to the next, and custom contrasts aren't adaptable to that situation. So you need to write a function with an extension .emmc to its name that takes the set of levels (of any length) and returns a data frame of coefficients.

Here's a weird but illustrative example. It generates contrast coefficients 1, 2, ..., n except the ith position is the negative of the sum of the others.

foo.emmc = function(levels) {
    n = length(levels)
    tmp = 1:n
    rtn = list()
    for (i in 1:n) {
        x = tmp
        x[i] = -sum(x[-i])
        rtn[[paste0("con.", levels[i])]] = x
    }
    as.data.frame(rtn)
}

Illustration

> foo.emmc(1:3)
  con.1 con.2 con.3
1    -5     1     1
2     2    -4     2
3     3     3    -3

Use with a model:

> warp.lm = lm(breaks ~ wool*tension, data = warpbreaks)
> (warp.emm = emmeans(warp.lm, ~tension | wool))
wool = A:
 tension emmean   SE df lower.CL upper.CL
 L         44.6 3.65 48     37.2     51.9
 M         24.0 3.65 48     16.7     31.3
 H         24.6 3.65 48     17.2     31.9

wool = B:
 tension emmean   SE df lower.CL upper.CL
 L         28.2 3.65 48     20.9     35.6
 M         28.8 3.65 48     21.4     36.1
 H         18.8 3.65 48     11.4     26.1

Confidence level used: 0.95 

> contrast(warp.emm, "foo")
wool = A:
 contrast estimate   SE df t.ratio p.value
 con.L      -101.1 22.5 48  -4.498  <.0001
 con.M        22.2 18.6 48   1.195  0.2379
 con.H        18.9 13.6 48   1.384  0.1727

wool = B:
 contrast estimate   SE df t.ratio p.value
 con.L       -27.2 22.5 48  -1.211  0.2318
 con.M       -30.6 18.6 48  -1.643  0.1069
 con.H        29.4 13.6 48   2.158  0.0360

> contrast(warp.emm, "foo", by = NULL)
 contrast estimate   SE df t.ratio p.value
 con.L.A    -400.0 80.7 48  -4.955  <.0001
 con.M.A      31.7 77.2 48   0.410  0.6834
 con.H.A      20.0 73.5 48   0.272  0.7866
 con.L.B     -57.0 69.6 48  -0.819  0.4167
 con.M.B     -68.7 65.4 48  -1.049  0.2993
 con.H.B     141.3 61.0 48   2.316  0.0249

Hope this helps

rvlenth commented 1 year ago

just to be clear, the above is not an example with nesting, but it's an example of a custom contrast function that will also work with a nested model.

rvlenth commented 1 year ago

It needs to be in the same workspace that you are running contrast from. Orin the search path at least.

abmahesh commented 1 year ago

It needs to be in the same workspace that you are running contrast from. Orin the search path at least. Russ On Feb 9, 2023, at 11:17 PM, abmahesh @.> wrote:  Thank you Professor, I appreciate all the help. When I run foo.emmc(1:3), I do get a similar output as yours, but for some reason, after that when I run the contrast function: I get this error: Error in contrast.emmGrid(wkrg[rows, drop.levels = TRUE], method = method, : Contrast function 'foo.emmc' not found when I create a separate function with a suffix .emmc and run, I get a similar error that it can't find the function!! — Reply to this email directly, view it on GitHub<#186 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGMJPL5ZCIN3EDE4NSWXM6DWWXFPPANCNFSM4MFPKDOA. You are receiving this because you modified the open/close state.Message ID: @.>

Thank you Professor, I appreciate all the help. I was able to create and run the function with .emmc extension and run the contrast, however, the results are not what I expect.

my function is: custom_contrast.emmc = function(levels) { n = length(levels) contrast_weights = c(-1, 1, 1, -1) rtn = data.frame(t(contrast_weights)) colnames(rtn) = levels rtn }

I get the following output for each category: contrast estimate SE df t.ratio p.value poscontrol effect 0.11 ............... postest effect 0.09 ........... precontrol effect -0.10 . ........... pretest effect -0.10 ......................

whereas I am looking for just one line for each category: A-B_C+D Also, even if I change the signs in the vector C(-1,1....), the results don't change, which seems weird!!

to sum up: I am not able to create the correct custom contrast function to get A-B-C+D (one line for each category)

rvlenth commented 1 year ago

You are showing results of "effect" contrasts, which is the default in contrast(). I surmise you didn't name your contrast function in the call. You need to run it like

contrast(emm_obj, "custom_contrast")
abmahesh commented 1 year ago

You are showing results of "effect" contrasts, which is the default in contrast(). I surmise you didn't name your contrast function in the call. You need to run it like

contrast(emm_obj, "custom_contrast")

Thank you Professor. I actually did write the contrast command like that (as you have written it above). I think the issue was with how I was creating the function "custom_contrast". I was finally able to resolve it. Thanks for all the help.