rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

qdrg seems to miss certain terms in a geeglm model #363

Closed Generalized closed 2 years ago

Generalized commented 2 years ago

Let's use the data - repeated-measure logistic regression.

d <- structure(list(ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 
4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 
9, 9, 10, 10, 10, 10), Covariate = c(1, 1.1, 1.5, 2, 1.1, 1.1, 
1.4, 3, 1.1, 1.2, 1.7, 5, 1.5, 3, 4.6, 6, 2.3, 4.4, 5.5, 8, 2, 
2.2, 2.3, 7, 4.5, 1, 3, 7, 3.3, 4.4, 5.5, 5, 3, 4, 5, 8, 1.1, 
2.2, 3.3, 5), SucN = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 
0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 
1, 1, 1, 1, 1, 0), Time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L), .Label = c("M0", "M1", "M2", "M3"), class = "factor"), 
TimeNum = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
3, 4, 1, 2, 3, 4)), row.names = c(NA, 40L), class = "data.frame")

which looks like this:

> head(d, 10)
   ID Covariate SucN Time TimeNum
1   1       1.0    0   M0       1
2   1       1.1    0   M1       2
3   1       1.5    0   M2       3
4   1       2.0    0   M3       4
5   2       1.1    0   M0       1
6   2       1.1    0   M1       2
7   2       1.4    0   M2       3
8   2       3.0    1   M3       4
9   3       1.1    0   M0       1
10  3       1.2    0   M1       2

Let's fit a simple GEE:

library(geepack)
library(emmeans)

options(contrasts = c("contr.sum", "contr.poly"))
m <- geeglm(SucN ~ Time * Covariate, family=binomial(link = "logit"), data=d, id=ID, corstr = "exchangeable", std.err="san.se")

I want the ANOVA-like table of the main and interaction effects, especially joint test of the levels of the categorical variable Time (adjusted for covariate):

> joint_tests(m)
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   3.123  0.0250
 Covariate        1 Inf   0.282  0.5950
 Time:Covariate   3 Inf   2.406  0.0650

Now, let's do the same with qdrg, using the previously fit "m"

> joint_tests(m)
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   3.123  0.0250
 Covariate        1 Inf   0.282  0.5950
 Time:Covariate   3 Inf   2.406  0.0650

> joint_tests(qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), vcov = vcov(m), df = Inf))
 model term df1 df2 F.ratio p.value
 Time         3 Inf   2.765  0.0403

The "Covariate" and the interaction term disappeared. Let's see why:

> joint_tests(qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), vcov = vcov(m), df = Inf), show0df = TRUE)
 model term     df1 df2 F.ratio p.value note
 Time             3 Inf   2.765  0.0403     
 Covariate        0  NA      NA      NA  d e
 Time:Covariate   0  NA      NA      NA  d e

d: df1 reduced due to linear dependence 
e: df1 reduced due to non-estimability 

OK, it tells what happened, but why the joint_tests(m) worked, and this one does not?

The model coefficients and the V-cov:

> coef(m)
    (Intercept)           Time1           Time2           Time3       Covariate Time1:Covariate Time2:Covariate Time3:Covariate 
         -1.455           4.846          -2.094          -1.033          -0.397          -3.869           1.372           1.325 

> vcov(m)
                (Intercept)   Time1   Time2   Time3 Covariate Time1:Covariate Time2:Covariate Time3:Covariate
(Intercept)           1.848  2.2204 -0.7246 -0.7855    -0.930          -1.885           0.595          0.5725
Time1                 2.220  4.3752 -0.9588 -0.0872    -1.342          -3.179           1.070          0.6540
Time2                -0.725 -0.9588  2.7250 -0.5240     0.612           1.591          -1.161         -0.0542
Time3                -0.786 -0.0872 -0.5240  2.0466     0.146           0.158           0.173         -0.5995
Covariate            -0.930 -1.3420  0.6118  0.1457     0.559           1.186          -0.430         -0.2516
Time1:Covariate      -1.885 -3.1789  1.5910  0.1583     1.186           2.752          -1.088         -0.5505
Time2:Covariate       0.595  1.0705 -1.1615  0.1725    -0.430          -1.088           0.619          0.1086
Time3:Covariate       0.573  0.6540 -0.0542 -0.5995    -0.252          -0.550           0.109          0.3016

Versions of packages: emmeans: 1.7.5 geepack: 1.3.4 estimability: 1.3 R core: 3.6.3

rvlenth commented 2 years ago

Well, this one is pretty easy to answer. See these examples

> ### Start with a "vanilla" reference grid...

> RG = ref_grid(m)
> joint_tests(RG)
 model term df1 df2 F.ratio p.value
 Time         3 Inf   2.765  0.0403

> QDRG = qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), 
        vcov = vcov(m), df = Inf)
> joint_tests(QDRG)
 model term df1 df2 F.ratio p.value
 Time         3 Inf   2.765  0.0403

> ### Include two values of the covariate  in the reference grid ...

> RGr = ref_grid(m, cov.reduce = range)
> joint_tests(RGr)
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   3.123  0.0248
 Covariate        1 Inf   0.282  0.5952
 Time:Covariate   3 Inf   2.406  0.0653

> QDRGr = qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), 
        vcov = vcov(m), df = Inf, cov.reduce = range)
> joint_tests(QDRGr)
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   3.123  0.0248
 Covariate        1 Inf   0.282  0.5952
 Time:Covariate   3 Inf   2.406  0.0653

In summary, you need to be able to estimate an effect before we can have a joint test. With covariates (with > 2 levels), the default is to reduce it to its mean - a single number so we can't estimate its effect.

The joint_tests() function has an argument cov.reduce; note that it defaults to range. However, that argument is only used when you give it a model, and it uses that to create the reference grid. But give it a reference grid already, it's too late to do anything about it.

rvlenth commented 2 years ago

OK, I better head off your next question...

Why is the test of Time different between [QD]RG and [QD]RGr?

Ans: Because they are based on a different Covariate mean. If we make them the same, ...

> symmint = function(x) mean(x) + c(-1, 1)
> RGs = ref_grid(m, cov.reduce = symmint)
> joint_tests(RGs)
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   2.765  0.0403
 Covariate        1 Inf   0.282  0.5952
 Time:Covariate   3 Inf   2.406  0.0653
Generalized commented 2 years ago

AH, I got it, thank you!

I strongly needed this especially with CRTgeeDR (doubly robust GEE for data MAR) to complete my work. It doesn't give any kind of tests for the categorical variables. I thought about the joint_tests as the last-chance option for me to complete a longitudinal analysis for some clinical trial.

Now I know how to do it. As always - with great examples!

I wish I had some more time to create a "tutorial" about the emmeans and using it in the most common scenarios regression_model -> effect_table and making it "compliant" with anova, Anova, drop1 etc (it's all about the grid values) and the issues like this one.

Thank you very much again!

rvlenth commented 2 years ago

Geez Louise. qdrg is already a lot! Just add cov.reduce = range.

Generalized commented 2 years ago

Yes, but joint_tests isn't supported bu the CRTgeeDR. I used geepack only to show the issue and compare results. That's why I'm playing with the qdrg - I would need to implement the entire framework for CRTgeeDR and it's very late today, I have to finish my work. qdrg gives me the interface.

I'm sorry for continuing this topic, but I think it's very related to the current topic and I'm a little lost (it's late night in my country, and my brain is sleepy...)

I want to reproduce the geeasy::drop1. It tests at Covariate = 0. Let's skip the question doesn't make sense and assume it makes in this case. I'm just trying to align things, without further context right now.

geeasy:::drop1.geeglm(m, scope = ~Time * Covariate)
Single term deletions

Model:
SucN ~ Time * Covariate
               DF Wald Pr(>Chi)  
Time            3 6.49    0.090 .
Covariate       1 0.28    0.595  
Time:Covariate  3 7.22    0.065 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I can reproduce this with joint_tests partially:

joint_tests(m, at = list(Covariate=0))
 model term df1 df2 F.ratio p.value
 Time         3 Inf   2.162  0.0902

But now let's assume the model I need isn't supported by default and I need the qdrg to make it working (without making the entire framework for it..., just to complete my current work).

This wont work of course:

> joint_tests(qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), 
+      vcov = vcov(m), df = Inf, cov.reduce = range), at = list(Covariate=0))
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   3.123  0.0250
 Covariate        1 Inf   0.282  0.5950
 Time:Covariate   3 Inf   2.406  0.0650

Is there a way to get the same answer in a single table, without copy/pasting just a single value from drop1 to the output of joint_tests? I mean - to combine joint_tests, qdrg and show the value at which I want to test it together?

rvlenth commented 2 years ago

I don't get what you want, I guess. But I'm not surprised the last thing doesn't work, because at should be an argument to qdrg and not joint_tests. I think you might want

joint_tests(qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), 
      vcov = vcov(m), df = Inf, at = list(Covariate=c(-1,1))))

since this centers Covariate around zero. (You don't have to use cov.reduce = range; what you need is more than one value.)

But otherwise I think if you want results from drop1, it's best to use drop1.

Generalized commented 2 years ago

It was absolutely that!

> joint_tests(qdrg(formula=SucN ~ Time * Covariate, data=d, coef = coef(m), 
+                  vcov = vcov(m), df = Inf, at = list(Covariate=c(-1,1))))
 model term     df1 df2 F.ratio p.value
 Time             3 Inf   2.162  0.0900
 Covariate        1 Inf   0.282  0.5950
 Time:Covariate   3 Inf   2.406  0.0650
> geeasy:::drop1.geeglm(m, scope = ~Time * Covariate)
Single term deletions

Model:
SucN ~ Time * Covariate
               DF Wald Pr(>Chi)  
Time            3 6.49    0.090 .
Covariate       1 0.28    0.595  
Time:Covariate  3 7.22    0.065 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

All agrees. Thank you!

rvlenth commented 2 years ago

FWIW, I have added more about covariates to the documentation for joint_tests() and provided some alternative functions to use with cov.reduce, based on some of this discussion. For instance, you may add cov.reduce = symmint(0) to your ref_grid(), qdrg(), or joint_tests() call to request covariates to all be set to c(-1, 1).