rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
348 stars 31 forks source link

GEE for numerical data + qdrg() + multiple imputation: how to tell qdrg() to handle ordered factors? #501

Closed Generalized closed 1 month ago

Generalized commented 2 months ago

I needed to git a GEE model for numerical data and wanted to assess the trends in means. I worked with multiply imputed data. It worked, but I needed to use a feature of GEE that emmeans did not support.

Therefore, I needed to use qdrg() over pooled myself coefficients and the necessary covariance estimator. Unfortunately, qdrg() ignored the fact, that in the pooled beta estimates I provided "Visit" variable was an ordinal factor, and treated it as an unordered one. Finally, the grid was incorrect.

> trend_fit <- lapply(imp_list, 
                    function(dat) {
                      glmgee(Painscore ~ Visit * Arm, family = gaussian(link = "identity"), id = ID,  data=dat, corstr = "unstructured")})

> pooled_coef <- pool_estimates_for_qdrg(k=5,
                                        coefficients = lapply(trend_fit, function(analysis) analysis$coefficients),
                                        varcov       = lapply(trend_fit, function(analysis) vcov(analysis, type = "robust")))

# Here I explicitly told qdrg() what to use:
> emm_grid <- with(pooled_coef,
                  qdrg(formula = Painscore ~ Visit * Arm, data=imp_list[[1]], 
                  coef = pooled_coef, 
                  vcov = pooled_VC, df = Inf)) 

> emm2 <- emmeans(emm_grid, specs = ~ Visit | Arm,  adjust="none", df=Inf)
> contrast(emm2, "poly", max.degree=2, by = "Arm")

Arm = A:
 contrast  estimate SE  df z.ratio p.value
 linear           0  0 Inf     NaN     NaN
 quadratic        0  0 Inf     NaN     NaN

Arm = B:
 contrast  estimate SE  df z.ratio p.value
 linear           0  0 Inf     NaN     NaN
 quadratic        0  0 Inf     NaN     NaN

Degrees-of-freedom method: user-specified 

> emm2@linfct
     (Intercept) VisitV2 VisitV3 ArmB VisitV2:ArmB VisitV3:ArmB
[1,]           1       0       0    0            0            0
[2,]           1       1       0    0            0            0
[3,]           1       0       1    0            0            0
[4,]           1       0       0    1            0            0
[5,]           1       1       0    1            1            0
[6,]           1       0       1    1            0            1

It failed because qdrg() treated Visit as an unordered factor, rather than ordered one. The "L", "Q" and "C" columns were missing, despite the fact it existed in my pooled vector:

> pooled_coef$pooled_coef
 (Intercept)      Visit.L      Visit.Q         ArmB Visit.L:ArmB Visit.Q:ArmB 
   4.1714286    0.2828427    0.4199125    0.1047619    0.7071068   -1.1314310 

I followed your advice and it worked well (I publish the code in a separate issue to not make this excessively long: https://github.com/rvlenth/emmeans/issues/502):

> trend_emms<- lapply(imp_list, 
                    function(dat) {
                        m <- glmgee(Painscore ~ Visit * Arm, family = gaussian(link = "identity"),  id = ID,  data=dat,  corstr = "unstructured")
                        emmeans(m, specs = ~ Visit | Arm, adjust="none",  df=Inf, vcov =  vcov(m, type = "robust"))
                      })

> pooled_ems <- pool_emmeans(trend_emms)
> contrast(pooled_ems, "poly", max.degree=2, by = "Arm")

Arm = A:
 contrast  estimate    SE  df z.ratio p.value
 linear        0.40 0.678 Inf   0.590  0.5555
 quadratic     1.03 1.047 Inf   0.982  0.3261

Arm = B:
 contrast  estimate    SE  df z.ratio p.value
 linear        1.40 0.685 Inf   2.044  0.0409
 quadratic    -1.74 1.744 Inf  -1.000  0.3175

Degrees-of-freedom method: user-specified 

By using your approach I was able get rid of qdrg() this time only because emmeans knows the glmtoolbox package. If it did not, I would have to use qdrg() anyway.

My question is: Is there a way to tell the qdrg() to "make use" of ordered predictors?** so it can create appropriate lincft contrasts?

rvlenth commented 1 month ago

The qdrg() function does provide a contrasts argument, and that can be used to ensure that the right contrasts are generated:

``` r
library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'

foo = ubds
foo$B = ordered(foo$B)
attr(foo$C, "contrasts") = "contr.helmert"

mod = lm(y ~ A + B*C, data = foo)

rg = qdrg(formula = ~ A + B*C, data = foo, coef = coef(mod), 
          vcov = vcov(mod), df = df.residual(mod),
          contrasts = attr(model.matrix(mod), "contrasts"))

head(rg@linfct)
##   (Intercept) A2 A3           B.L        B.Q C1 C2       B.L:C1     B.Q:C1
## 1           1  0  0 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483
## 2           1  1  0 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483
## 3           1  0  1 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483
## 4           1  0  0 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966
## 5           1  1  0 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966
## 6           1  0  1 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966
##         B.L:C2     B.Q:C2
## 1 7.071068e-01 -0.4082483
## 2 7.071068e-01 -0.4082483
## 3 7.071068e-01 -0.4082483
## 4 7.850462e-17  0.8164966
## 5 7.850462e-17  0.8164966
## 6 7.850462e-17  0.8164966

Created on 2024-07-10 with reprex v2.1.0

That said, without the contrasts argument, it defaults to "contr.treatment" for all three. It does seem that it could be doing a better job with defaults, and I will look at it.

rvlenth commented 1 month ago

OK, I fixed this so the right contrasts are used by default:

library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
options(width = 110)

foo = ubds
foo$B = ordered(foo$B)
attr(foo$C, "contrasts") = "contr.helmert"

mod = lm(y ~ A + B*C, data = foo)

rg = qdrg(formula = ~ A + B*C, data = foo, coef = coef(mod), 
          vcov = vcov(mod), df = df.residual(mod))

head(rg@linfct)
##   (Intercept) A2 A3           B.L        B.Q C1 C2       B.L:C1     B.Q:C1       B.L:C2     B.Q:C2
## 1           1  0  0 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483 7.071068e-01 -0.4082483
## 2           1  1  0 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483 7.071068e-01 -0.4082483
## 3           1  0  1 -7.071068e-01  0.4082483 -1 -1 7.071068e-01 -0.4082483 7.071068e-01 -0.4082483
## 4           1  0  0 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966 7.850462e-17  0.8164966
## 5           1  1  0 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966 7.850462e-17  0.8164966
## 6           1  0  1 -7.850462e-17 -0.8164966 -1 -1 7.850462e-17  0.8164966 7.850462e-17  0.8164966

Created on 2024-07-11 with reprex v2.1.0

I think that resolves this issue. Thanks for calling it to my attention.

Generalized commented 1 month ago

Thank you very, very much for this quick fix!