rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
341 stars 30 forks source link

emmeans with multgee - discussion of the result #444

Closed Generalized closed 10 months ago

Generalized commented 10 months ago

Dear @rvlenth ,

My goal is to assess whether some categorical variable affects ordinal responses in some questionnaire. As the observations are repeated, I use the GEE estimation

While emmeans has the support for geepack, the ordgee method exposed by this package gives strange results, inconsistent with both other R packages and commercial software. https://vdocuments.mx/gee-for-longitudinal-ordinal-data-comparing-r-geepack-r-multgee-r-repolr.html?page=1 (or the manuscript: https://lirias.kuleuven.be/retrieve/364781 )

There is, however, a package giving much more compliant results: multgee.

As it's not supported out of the box by emmeans, I used qdrg - and it worked:

> library(multgee)
> library(emmeans)
> data(arthritis)
> 
> arthritis$time <- factor(arthritis$time)
> arthritis$trt <- factor(arthritis$trt)
> arthritis$baseline <- factor(arthritis$baseline)
> arthritis$y <- factor(arthritis$y, ordered = TRUE)
> 
> fitmod <- ordLORgee(formula = y ~ time + trt ,
+                     data = arthritis, id = id, repeated = time, LORstr = "uniform")
> coef(summary(fitmod))
       Estimate san.se    san.z Pr(>|san.z|)
beta10 -2.94787  0.227 -12.9795      0.00000
beta20 -0.94696  0.151  -6.2770      0.00000
beta30  0.79679  0.150   5.3249      0.00000
beta40  2.75319  0.191  14.4061      0.00000
time3   0.00389  0.109   0.0358      0.97147
time5  -0.31952  0.100  -3.1858      0.00144
trt2   -0.49195  0.176  -2.7933      0.00522
> 
> lord_grid <- qdrg(formula = y ~ time + trt, data=arthritis, coef = fitmod$coefficients, vcov = fitmod$robust.variance, df = Inf, ordinal.dim = 5, link = fitmod$link)
> emmeans(lord_grid, specs = ~time + trt , data=arthritis)
 time trt emmean    SE  df asymp.LCL asymp.UCL
 1    1    0.086 0.151 Inf    -0.209     0.381
 3    1    0.090 0.218 Inf    -0.337     0.518
 5    1   -0.233 0.204 Inf    -0.634     0.167
 1    2   -0.406 0.301 Inf    -0.997     0.185
 3    2   -0.402 0.347 Inf    -1.083     0.279
 5    2   -0.725 0.337 Inf    -1.385    -0.066

Results are given on the Cumulative logit (not the response) scale. 
Confidence level used: 0.95 

I'm now wondering, what should be provided to go into the (cumulative) probability scale (like regrid="response" when using the logistic regression)? I found this excellent vignette https://cran.r-project.org/web/packages/emmeans/vignettes/sophisticated.html#ordinal but it relates to a fully handles package "ordinal". It would suffice, if I had just single timepoint, but I need to do the analysis for several ones and the within-subject correlation isn't negligible in my case.

PS: I know there is also the clmm, but I am required to perform population-average (marginal) model rather than conditional, thus GEE.

PS: I'm totally fine with the current output (odds ratios and probabilities are mutually related, so it's fine to have either) as I'm interested more in "trends" and confidence intervals of the differences, however it would be nice to have consistent-scale outputs :)

rvlenth commented 10 months ago

I was kind of surprised you closed this, as I was getting set to expand qdrg's capabilities to fully support ordinal models. I decided to do it anyway... For illustration,

> qdrg(formula = y ~ time + trt, data=arthritis, coef = fitmod$coefficients, vcov = fitmod$robust.variance, df = Inf, 
+     ordinal = list(dim = 5, mode = "prob"), link = "logit") -> rgp

> emmeans(rgp, ~ response | trt)
trt = 1:
 response   prob     SE  df asymp.LCL asymp.UCL
 1        0.0556 0.0132 Inf    0.0297    0.0815
 2        0.2465 0.0295 Inf    0.1887    0.3043
 3        0.4082 0.0196 Inf    0.3699    0.4466
 4        0.2349 0.0296 Inf    0.1769    0.2929
 5        0.0548 0.0112 Inf    0.0328    0.0767

trt = 2:
 response   prob     SE  df asymp.LCL asymp.UCL
 1        0.0878 0.0298 Inf    0.0293    0.1462
 2        0.3261 0.0517 Inf    0.2247    0.4275
 3        0.3863 0.0317 Inf    0.3242    0.4483
 4        0.1657 0.0413 Inf    0.0848    0.2465
 5        0.0342 0.0113 Inf    0.0122    0.0563

Results are averaged over the levels of: time 
Confidence level used: 0.95 

I need to do some checking to make sure these results are right, etc. before I push this to GitHub.

Note the ordinal.dim argument is supplanted by ordinal, which is a list; but secretly, ordinal.dim is still supported and does the right thing.