rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

emmeans over geepack::geeglm accepts "vcov" but glmtoolbox::glmgee does not #460

Closed Generalized closed 6 months ago

Generalized commented 6 months ago

Dear @rvlenth I noticed that in the newest update of the emmeans the glmtoolbox is supported without the need for qdrg(). I thank you for it very, very much!

When I fit some model via geepack::geeglm and provide the vcov parameter, then emmeans recognizes the fact it's user-supplied:

mg <- geepack::geeglm(X~ Y* T+ ....., corstr = "exchangeable", id=Pat, data=dd)
emmeans(mg, specs = ~Y*T, vcov=my_estimator)
....
....
Covariance estimate used: user-supplied  # here!
Confidence level used: 0.95 

The message "user-supplied" confirms it recognized the v-cov estimator provided by the user.

Now I want to use it with glmtoolbox:

> emmeans(mg, specs = ~Y*T, vcov=vcov(mg, type = "bias-corrected"))
Y       T   emmean    SE df lower.CL upper.CL
 B      6      4.11 0.370 46     3.36     4.85
 K      6      5.39 0.310 46     4.77     6.02
 B      12     3.86 0.467 46     2.92     4.80
 K      12     5.06 0.341 46     4.37     5.75

Confidence level used: 0.95 

There's no message that user-provided vcov was used. Let's use a different estimator

> emmeans(mg, specs = ~Y*T, vcov=vcov(mg, type = "model"))
Y      T emmean    SE df lower.CL upper.CL
 B      6      4.11 0.370 46     3.36     4.85
 K      6      5.39 0.310 46     4.77     6.02
 B      12     3.86 0.467 46     2.92     4.80
 K      12     5.06 0.341 46     4.37     5.75

Confidence level used: 0.95 

No message, same SEs and CIs. Nothing changed.


So, while for geepack the user-supplied vcov parameter is recognized, for glmtoolbox it's not.

I need then the qdrg() to account for it:

mq <- qdrg(formula = Y*T, coef=coef(mg), vcov= vcov(mg, type = "bias-corrected"), data=dd, df = 46)
emmeans(mq, specs = ~Y*T)
Y       T emmean    SE df lower.CL upper.CL
 B      6      4.11 0.410 46     3.28     4.93
 K      6      5.39 0.358 46     4.67     6.12
 B      12     3.86 0.526 46     2.80     4.92
 K      12     5.06 0.393 46     4.27     5.85

Confidence level used: 0.95 

vs.

mq <- qdrg(formula = Y*T, coef=coef(mg), vcov= vcov(mg, type = "model"), data=dd, df = 46)
emmeans(mq, specs = ~Y*T)

 Y      T emmean    SE df lower.CL upper.CL
 B      6      4.11 0.403 46     3.29     4.92
 K      6      5.39 0.403 46     4.58     6.21
 B      12     3.86 0.403 46     3.05     4.68
 K      12     5.06 0.403 46     4.25     5.87

Confidence level used: 0.95 

And indeed, now the SEs and CIs are different for different type of vcov().

I'm sorry for giving no data, but I'd need to anonymize it and make several changes. But you can use any single dataset and example, it's not data-specific. It's package-level specific.

rvlenth commented 6 months ago

Thanks for the report. I understand the concern about your confidential data. However, since you say that any single data set can reproduce this error, I would like to ask you to provide a reproducible example, preferably from datasets or one of the packages. That will ensure that I am working with a suitable example, and also verify that the bug indeed is not an artifact of your particular dataset.

Generalized commented 6 months ago

Please find the dataset

d <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
25L, 26L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
25L, 26L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 
25L, 26L), levels = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "101", "102", "103", "104", "105", 
"106", "107", "108", "109", "110", "111", "112", "113"), class = "factor"), 
    Method = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("B", "K"), class = "factor"), 
    Time = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("1", "6", "12"
    ), class = "factor"), Response = c(5.5, 5.5, 6, 11.5, 7, 
    5.5, 7, 7.5, 12, 8, 8.5, 8.5, 8.5, 3, 4.5, 3, 4.5, 3, 4, 
    3.5, 3.5, 3.5, 8, 4.5, 6.5, 4, 6, 3, 3, 4, 2, 3, 3, 3, 3, 
    8, 4, 6, 4, 5.5, 5.5, 5, 6, 7, 5.5, 8, 5.5, 6.5, 5, 8, 8, 
    6, 5.5, 4.5, 4.5, 5.5, 6.5, 5.5, 4, 5, 5, 3, 6, 7.5, 5.5, 
    5.5, 3, 4, 5, 6, 5, 4, 5, 5, 3, 6, 7.5, 5), Baseline = c(5.5, 
    5.5, 6, 11.5, 7, 5.5, 7, 7.5, 12, 8, 8.5, 8.5, 8.5, 5.5, 
    5.5, 6, 11.5, 7, 5.5, 7, 7.5, 12, 8, 8.5, 8.5, 8.5, 5.5, 
    5.5, 6, 11.5, 7, 5.5, 7, 7.5, 12, 8, 8.5, 8.5, 8.5, 5.5, 
    5.5, 5, 6, 7, 5.5, 8, 5.5, 6.5, 5, 8, 8, 6, 5.5, 5.5, 5, 
    6, 7, 5.5, 8, 5.5, 6.5, 5, 8, 8, 6, 5.5, 5.5, 5, 6, 7, 5.5, 
    8, 5.5, 6.5, 5, 8, 8, 6)), row.names = c(NA, 78L), class = "data.frame")

For glmtoolbox + emmeans:

> mg <- glmtoolbox::glmgee(Response ~ Method * Time, corstr = "unstructured", id=ID, data=d)
> (mg_em_1 <- emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="robust")))
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.562 72     6.65     8.89
 K      1      6.27 0.301 72     5.67     6.87
 B      6      4.27 0.391 72     3.49     5.05
 K      6      5.23 0.301 72     4.63     5.83
 B      12     4.00 0.449 72     3.11     4.89
 K      12     4.92 0.330 72     4.26     5.58

Confidence level used: 0.95 
> (mg_em_2 <- emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="model")))
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.562 72     6.65     8.89
 K      1      6.27 0.301 72     5.67     6.87
 B      6      4.27 0.391 72     3.49     5.05
 K      6      5.23 0.301 72     4.63     5.83
 B      12     4.00 0.449 72     3.11     4.89
 K      12     4.92 0.330 72     4.26     5.58

Confidence level used: 0.95 

> identical(mg_em_1, mg_em_2)
[1] TRUE

For glmtoolbox + qdrg + emmeans:

> (mg_em_1 <- emmeans(  qdrg(formula = Response ~ Method * Time, 
+                            coef=coef(mg), vcov= vcov(mg, type = "robust"), data=d),
+                       specs = ~ Method * Time))
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.562 Inf      6.67      8.87
 K      1      6.27 0.301 Inf      5.68      6.86
 B      6      4.27 0.391 Inf      3.50      5.04
 K      6      5.23 0.301 Inf      4.64      5.82
 B      12     4.00 0.449 Inf      3.12      4.88
 K      12     4.92 0.330 Inf      4.28      5.57

Confidence level used: 0.95 
> 
> (mg_em_2 <- emmeans(  qdrg(formula = Response ~ Method * Time, 
+                            coef=coef(mg), vcov= vcov(mg, type = "model"), data=d),
+                       specs = ~ Method * Time))
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.416 Inf      6.95      8.59
 K      1      6.27 0.416 Inf      5.45      7.09
 B      6      4.27 0.416 Inf      3.45      5.09
 K      6      5.23 0.416 Inf      4.41      6.05
 B      12     4.00 0.416 Inf      3.18      4.82
 K      12     4.92 0.416 Inf      4.11      5.74

Confidence level used: 0.95 

> identical(mg_em_1, mg_em_2)
[1] FALSE

For geepack::geeglm + emmeans it works from the emmeans level. geepack doesn't provide the vcov(type), but I can use one from the glmtoolbox to illustrate (I didn't want to push totally made up values)

> mg1 <- geepack::geeglm(Response ~ Method * Time, corstr = "ar1", id=ID, data=d)
> (mg_em_1 <- emmeans(mg1, specs = ~ Method * Time, vcov = vcov(mg, type="robust")))
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.562 Inf      6.67      8.87
 K      1      6.27 0.301 Inf      5.68      6.86
 B      6      4.27 0.391 Inf      3.50      5.04
 K      6      5.23 0.301 Inf      4.64      5.82
 B      12     4.00 0.449 Inf      3.12      4.88
 K      12     4.92 0.330 Inf      4.28      5.57

Covariance estimate used: user-supplied 
Confidence level used: 0.95 
> (mg_em_2 <- emmeans(mg1, specs = ~ Method * Time, vcov = vcov(mg, type="model")))
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.416 Inf      6.95      8.59
 K      1      6.27 0.416 Inf      5.45      7.09
 B      6      4.27 0.416 Inf      3.45      5.09
 K      6      5.23 0.416 Inf      4.41      6.05
 B      12     4.00 0.416 Inf      3.18      4.82
 K      12     4.92 0.416 Inf      4.11      5.74

Covariance estimate used: user-supplied 
Confidence level used: 0.95 

> identical(mg_em_1, mg_em_2)
[1] FALSE
rvlenth commented 6 months ago

This is what I get:

For mg (glmgee)

> emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="robust"))
Error in match.arg(type) : 'arg' must be NULL or a character vector

> emmeans(mg, specs = ~ Method * Time, vcov = "robust")
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.562 72     6.65     8.89
 K      1      6.27 0.301 72     5.67     6.87
 B      6      4.27 0.391 72     3.49     5.05
 K      6      5.23 0.301 72     4.63     5.83
 B      12     4.00 0.449 72     3.11     4.89
 K      12     4.92 0.330 72     4.26     5.58

Confidence level used: 0.95 

> emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="model")))
Error: unexpected ')' in "emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="model")))"

> emmeans(mg, specs = ~ Method * Time, vcov = "model")
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.416 72     6.94     8.60
 K      1      6.27 0.416 72     5.44     7.10
 B      6      4.27 0.416 72     3.44     5.10
 K      6      5.23 0.416 72     4.40     6.06
 B      12     4.00 0.416 72     3.17     4.83
 K      12     4.92 0.416 72     4.09     5.75

Confidence level used: 0.95

So for me, the user-supplied covariances in your test don't work at all, but you can just supply the named covariance methods provided

For mg1 (geeglm)

> ## user-supplied covariances: as you show

> ## named covariance methods...

> emmeans(mg1, specs = ~ Method * Time, vcov.method = "robust")
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.562 Inf      6.67      8.87
 K      1      6.27 0.301 Inf      5.68      6.86
 B      6      4.27 0.391 Inf      3.50      5.04
 K      6      5.23 0.301 Inf      4.64      5.82
 B      12     4.00 0.449 Inf      3.12      4.88
 K      12     4.92 0.330 Inf      4.28      5.57

Covariance estimate used: vbeta 
Confidence level used: 0.95 

> emmeans(mg1, specs = ~ Method * Time, vcov = "model")
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.562 Inf      6.67      8.87
 K      1      6.27 0.301 Inf      5.68      6.86
 B      6      4.27 0.391 Inf      3.50      5.04
 K      6      5.23 0.301 Inf      4.64      5.82
 B      12     4.00 0.449 Inf      3.12      4.88
 K      12     4.92 0.330 Inf      4.28      5.57

Covariance estimate used: vbeta 
Confidence level used: 0.95 

So this is just the opposite! named covariance methods don't work for geeglm, and user-supplied ones don't work for glmgee. And I really, really wish these class names weren't so damn easy to confuse.

I think the real source of the problem is that ref_grid() has an optional argument vcov. (really part of the ... argument), whereas the emm_basis methods for these objects both have a vcov.method argument, and there is masking going on.

I can't exaplin why you are getting the user-supplied covariances to work for geeglm objects. What version of emmeans are you using?

rvlenth commented 6 months ago

I've found an easy fix for glmgee, but so far not providing a "user-supplied" message:

> emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="model"))
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.416 72     6.94     8.60
 K      1      6.27 0.416 72     5.44     7.10
 B      6      4.27 0.416 72     3.44     5.10
 K      6      5.23 0.416 72     4.40     6.06
 B      12     4.00 0.416 72     3.17     4.83
 K      12     4.92 0.416 72     4.09     5.75

Confidence level used: 0.95 

> emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type="robust"))
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.562 72     6.65     8.89
 K      1      6.27 0.301 72     5.67     6.87
 B      6      4.27 0.391 72     3.49     5.05
 K      6      5.23 0.301 72     4.63     5.83
 B      12     4.00 0.449 72     3.11     4.89
 K      12     4.92 0.330 72     4.26     5.58

Confidence level used: 0.95 
rvlenth commented 6 months ago

OK, I got it through my thick skull that with geeglm objects, vcov.method = <character value> matches an internal slot whereas with glmgee, vcov.method = <character value> is passed as the type argument to vcov(). So I think everything works as it should now (at least by one definition of "should": handling character values per code design, non-character as the internal vcov. argument). My trouble is I don't really use GEE methods and geepack is undocumented as far as vcov() methods go. EWe now have annotations for glmgee.

Some examples...

Text values

> ## geeglm...
> emmeans(mg1, specs = ~ Method * Time, vcov = "robust")
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.562 Inf      6.67      8.87
 K      1      6.27 0.301 Inf      5.68      6.86
 B      6      4.27 0.391 Inf      3.50      5.04
 K      6      5.23 0.301 Inf      4.64      5.82
 B      12     4.00 0.449 Inf      3.12      4.88
 K      12     4.92 0.330 Inf      4.28      5.57

Covariance estimate used: vbeta 
Confidence level used: 0.95 

> emmeans(mg1, specs = ~ Method * Time, vcov = "naive")
 Method Time emmean  SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.4 Inf      6.99      8.55
 K      1      6.27 0.4 Inf      5.49      7.05
 B      6      4.27 0.4 Inf      3.49      5.05
 K      6      5.23 0.4 Inf      4.45      6.01
 B      12     4.00 0.4 Inf      3.22      4.78
 K      12     4.92 0.4 Inf      4.14      5.71

Covariance estimate used: vbeta.naiv 
Confidence level used: 0.95 

> ## glmgee...
> emmeans(mg, specs = ~ Method * Time, vcov = "model")
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.416 72     6.94     8.60
 K      1      6.27 0.416 72     5.44     7.10
 B      6      4.27 0.416 72     3.44     5.10
 K      6      5.23 0.416 72     4.40     6.06
 B      12     4.00 0.416 72     3.17     4.83
 K      12     4.92 0.416 72     4.09     5.75

Covariance estimate used: model 
Confidence level used: 0.95 

> emmeans(mg, specs = ~ Method * Time, vcov = "robust")
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.562 72     6.65     8.89
 K      1      6.27 0.301 72     5.67     6.87
 B      6      4.27 0.391 72     3.49     5.05
 K      6      5.23 0.301 72     4.63     5.83
 B      12     4.00 0.449 72     3.11     4.89
 K      12     4.92 0.330 72     4.26     5.58

Covariance estimate used: robust 
Confidence level used: 0.95 

Matrix values

> emmeans(mg1, specs = ~ Method * Time, vcov = vcov(mg, type = "jackknife"))
 Method Time emmean    SE  df asymp.LCL asymp.UCL
 B      1      7.77 0.608 Inf      6.58      8.96
 K      1      6.27 0.326 Inf      5.63      6.91
 B      6      4.27 0.423 Inf      3.44      5.10
 K      6      5.23 0.326 Inf      4.59      5.87
 B      12     4.00 0.486 Inf      3.05      4.95
 K      12     4.92 0.358 Inf      4.22      5.62

Covariance estimate used: user-supplied 
Confidence level used: 0.95 

> emmeans(mg, specs = ~ Method * Time, vcov = vcov(mg, type = "jackknife"))
 Method Time emmean    SE df lower.CL upper.CL
 B      1      7.77 0.608 72     6.56     8.98
 K      1      6.27 0.326 72     5.62     6.92
 B      6      4.27 0.423 72     3.43     5.11
 K      6      5.23 0.326 72     4.58     5.88
 B      12     4.00 0.486 72     3.03     4.97
 K      12     4.92 0.358 72     4.21     5.64

Covariance estimate used: user-supplied 
Confidence level used: 0.95 
rvlenth commented 6 months ago

I think this is resolved, so am closing this issue