rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
340 stars 30 forks source link

Degrees of freedom with Gaussian model fit via geepack and glmtoolbox #496

Closed Generalized closed 3 days ago

Generalized commented 5 days ago

Hello,

I noticed a discrepancy between how geepack and glmtoolbox GEE engines are supported by emmeans regarding the degrees of freedom.

glmtoolbox::glmgee

When I look into the emm_basis.glmgee I can notice does a trick - assigns the lm and glm classes to the result. This enables the emm_basis.lm (or glm). Now, if the class is lm (or glm but conditional distribution is Gaussian (or gamma) like in my case), the df.residuals is returned, otherwise - Inf.

> emmeans:::emm_basis.glmgee
function (object, trms, xlev, grid, vcov.method = "robust", ...) 
{
....
    class(object) = c("glm", "lm")

-----------------

> emmeans:::emm_basis.lm
function (object, trms, xlev, grid, ...) 
{
......
  if (inherits(object, "glm")) {
        misc = .std.link.labels(object$family, misc)
        dffun = function(k, dfargs) dfargs$df
        dfargs = list(df = ifelse(object$family$family %in% c("gaussian",  "Gamma"), object$df.residual, Inf))  # <-----

geepack:geeglm

This returns Inf - and this is hardcoded.

I think it would be good to unify the way GEE is handled. It's even simpler, because the geeglm already offers the df.residual component. And the geeglm function assigns also the glm and lm classes: class(out) <- c("geeglm", "gee", "glm", "lm")

Either way (both return Inf or both return residual df) will be fine, but I think they should be consistent...

rvlenth commented 5 days ago

Give me reproducible example(s) please.

Generalized commented 5 days ago

I'm sorry, I just updated my question, because the original part was irrelevant and I figured it out.

But the second part of my question holds - would it be possible to unify them both to return either Inf or residual.df? geepack (for which Inf is returned) has built in residual.df function, so this would save some work.

This requires no reproducible example. Just information that geepack uses Inf because it's hardcoded in your procedure, and glmtoolbox uses residual.df() from the lm(). So my question is if it's possible that either both returned the same for consistency or both returned residual.df() (geepack offers it natively).

rvlenth commented 4 days ago

No, I need an example so I can see what it does.

Generalized commented 3 days ago
d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"), 
    Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
    "V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L, 
    5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L, 
    4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L, 
    4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")

geepack::geeglm. df=Inf because it's hardcoded emmeans:::emm_basis.geeglm --> dffun = function(k, dfargs) Inf

> library(geepack)
> emmeans( m1 <- geeglm(Painscore ~ Visit * Arm,
                family = gaussian(link = "identity"),
                id = ID, 
                corstr = "unstructured",
                data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE  df asymp.LCL asymp.UCL
 V1    A     4.11 0.562 Inf      3.01      5.22
 V2    A     3.88 0.748 Inf      2.42      5.35
 V3    A     4.80 0.416 Inf      3.99      5.62
 V1    B     3.22 0.735 Inf      1.78      4.66
 V2    B     4.69 0.440 Inf      3.83      5.55
 V3    B     4.90 0.575 Inf      3.77      6.02

Covariance estimate used: vbeta 
Confidence level used: 0.95 

> m$df.residual  # geepack natively provides df.residual() out of the box.
[1] 30

glmtoolbox::glmgee df=30 because it uses lm df.residual()

> library(glmtoolbox)

> emmeans( m2 <- glmgee(Painscore ~ Visit * Arm,
               family = gaussian(link = "identity"),
               id = ID, 
               corstr = "unstructured",
               data=d), specs = ~Visit*Arm)

 Visit Arm emmean    SE df lower.CL upper.CL
 V1    A     4.09 0.570 30     2.93     5.26
 V2    A     3.86 0.744 30     2.34     5.38
 V3    A     4.82 0.437 30     3.93     5.71
 V1    B     3.25 0.742 30     1.73     4.77
 V2    B     4.61 0.433 30     3.73     5.50
 V3    B     4.97 0.650 30     3.64     6.29

Covariance estimate used: robust 
Confidence level used: 0.95 
Warning message:
Estimate of correlation matrix is not positive definite 

> df.residual(m2)
[30]

Could handling both be unified in that they return same df? Either both Inf or both df.residual()

It's not needed now, it can wait when you return in July. Just leaving it to not forget about it.

rvlenth commented 3 days ago

OK, I made both use df.residual(), which I hope always works.

> ref_grid(m1) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.11 0.562 30
 V2    A         3.88 0.748 30
 V3    A         4.80 0.416 30
 V1    B         3.22 0.735 30
 V2    B         4.69 0.440 30
 V3    B         4.90 0.575 30

Covariance estimate used: vbeta 

> ref_grid(m2) |> summary()
 Visit Arm prediction    SE df
 V1    A         4.09 0.570 30
 V2    A         3.86 0.744 30
 V3    A         4.82 0.437 30
 V1    B         3.25 0.742 30
 V2    B         4.61 0.433 30
 V3    B         4.97 0.650 30

Covariance estimate used: robust