Considerations on adding a more complete support for GEE thrugh the glmtoolbox::glmgee()

Generalized commented 8 months ago

Dear @rvlenth ,

I remember your opinion about adding support for additional models: authors of respective packages should follow the instructions and provide their own support for emmeans.

But despite that fact, I would like to kindly ask you for an exception, as it's about something that is already supported by emmeans - only through a different package. I mean the support for GEE, currently provided through the geepack::geeglm().

Well, I've been using it for a long time and - although it worked in most cases - I started looking for better options, for the following reasons:

1) "zombie-state". This package is kind of "frozen", a number of issues remained unresolved since 2021... One of its authors assured me that the package is alive, only they lack time to support it now. obraz

I hoped that the successor of geepack, geeasy, will be more "alive", but the same situation happens (silent since 2022).

2) The engine crashes entire R session when unstructured working covariance is requested, especially with the "waves" argument specified. It's really annoying, when one forgets it and the entire session is lost. The crashing is related to some numerical issue, but anyway, it should display a warning, like other packages do.

3) The ordgee function in this package is totally "alien" compared to all other R and commercial implementations.

Several other packages for GEE, dedicated to concrete tasks but mutually incompatible, have appeared over time (some are similarly "frozen"): wgeesel, geesmv, geeM, drgee, CRTgeeDR, not to mention several packages for GEE-based multinomial and ordinal logistic regression and other specialized scenarios.

And one day I found another one, actively developed: glmtoolbox

1) It doesn't crash (it gives informative warning) under unstructured and waves (useful when analysing longitudinal data with missing visits) 2) It supports more covariance structures 3) provides leverage, local.influence, DFBETA and Cook's distance out of the box 4) offers model-comparison based anova through Wald's and generalized score tests. 5) supports IPW (inverse-probability) weighting for dropouts out of the box (one has to specify the model for MAR dropouts), no need to play separately with the ipw package. Works on the observation and cluster level. 6) offers a module for nonlinear GEE 7) predict() is available 8) QIC and RJC (Rotnitzky-Jewell's) criterion out of the box 9) Mahalanobis', Pearson's and deviance residuals 10) forward/backward stepwise variable selection 11) various var.cov estimators are available: robust (Sandwich), df-adjusted, model (naive), bias-corrected and jackknife. By default the naive one is employed.

It also well integrates with qdrg():

emmeans(qdrg(data = shifts,formula = Diff ~ Timepoint,coef = coef(m1), vcov = m1$R), specs = ~Timepoint)

 Timepoint emmean    SE  df asymp.LCL asymp.UCL
 Week 1    -0.331 0.149 Inf    -0.624   -0.0376
 Week 2    -0.665 0.136 Inf    -0.932   -0.3982
 Week 4    -0.625 0.178 Inf    -0.973   -0.2761
 Week 8    -0.916 0.165 Inf    -1.240   -0.5925
 Week 16   -1.201 0.163 Inf    -1.520   -0.8812
 Week 56   -1.447 0.373 Inf    -2.178   -0.7154

Confidence level used: 0.95

Looking at all the above facts, I believe that adding glmtoolbox::glmgee to emmeans for a more complete support for GEE-estimated models would be truly beneficial.

rvlenth commented 7 months ago

I'm sorry it has taken a few days to answer.

Just because this package does the same type of analysis as some other supported packages doesn't change the fact that you are asking me to support a new package/object class. It remains the best choice for these packages themselves to incorporate emmeans support. I will consider what it would take, and may approach the other package developer if it seems appropriate.

Generalized commented 7 months ago

Thank you very much for the answer. Yes, I apologize if I sounded as if took things too lightly...

Yes, you're right, it's about a new class. I (naively) thought, that if it easily integrates via qdrg(), then maybe it won't be very problematic and taking very much time to add a support for it. I motivated it with the fact that geepack slowly becomes (or already has become) a zombie package with questionable behaviour. It's a pity that geepack is so much widespread despite lack of updates, fixed issues and new features, while glmtoolbox, offering so many features, is almost totally unknown :( It won't replace geepack, but having it onboard would be a step towards making things much better.

I will email Professor Vanegas, maybe he will be interested in integrating with emmeans. https://cran.r-project.org/web/packages/glmtoolbox/index.html

In the meantime, let me close this thread for now.

rvlenth commented 7 months ago

FWIW, It seems to get us a very long way to just pretend the model is from glm and provide a mode argument to choose which type of vcov matrix we want:

recover_data.glmgee = function(object, ...) {
    class(object) = c("glm", "lm")
    recover_data(object, ...)
}

# Note: the 'mode' argument is passed to vcov(object, type = mode)
# We can't use 'type' because it is already reserved for other uses
emm_basis.glmgee = function(object, trms, xlev, grid, mode = "robust", ...) {
    vcov. = vcov(object, type = mode, ...)
    class(object) = c("glm", "lm")
    emm_basis(object, trms = trms, xlev = xlev, grid = grid, vcov. = vcov., ...)
}

Illustration:

> example(glmgee)
# (output is suppressed)

> # Using default vcov matrix (robust)
> emmeans(fit8, ~ treatment|time4, type = "response")
time4 = 0:
 treatment rate    SE  df asymp.LCL asymp.UCL
 Placebo   9.12 0.990 Inf      7.37     11.28
 Progabide 7.37 0.916 Inf      5.78      9.40

time4 = 1:
 treatment rate    SE  df asymp.LCL asymp.UCL
 Placebo   7.80 0.694 Inf      6.55      9.28
 Progabide 6.30 0.801 Inf      4.91      8.09

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> # Using jackknife vcov matrix
> emmeans(fit8, ~ treatment|time4, type = "response", mode = "jack")
time4 = 0:
 treatment rate    SE  df asymp.LCL asymp.UCL
 Placebo   9.12 1.070 Inf      7.24     11.48
 Progabide 7.37 0.980 Inf      5.68      9.57

time4 = 1:
 treatment rate    SE  df asymp.LCL asymp.UCL
 Placebo   7.80 0.755 Inf      6.45      9.43
 Progabide 6.30 0.846 Inf      4.85      8.20

Confidence level used: 0.95 
Intervals are back-transformed from the log scale

The glmtoolbox package isn't on GitHub, else I would consider sending a pull request. Since this is so simple, maybe I'll just add it to emmeans. But in any case, if you load the above simple code into your workspacce, it should work.

rvlenth commented 7 months ago

Note: I changed mode to vcov.method to be consistent with other GEE support.

rvlenth / emmeans

Considerations on adding a more complete support for GEE thrugh the glmtoolbox::glmgee() #454