tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.43k stars 302 forks source link

Support for glmtoolbox (GEE) #1197

Closed adrianolszewski closed 2 months ago

adrianolszewski commented 2 months ago

Hello, Currently broom can tidy the geepack::geeglm() output.

There is, however, a new, much more advanced package for fitting GEE, which is actively maintained (unlike geepack; last serious update 2 years ago) and seems to be the new workhorse.

Would you consider adding tidiers to it? I quickly made the adjustments. I don't fully understand what "glance()" should do especially in terms of the var-cov matrix...

Unfortunately, I don't know how to disable printing the summary every time I want it. Wrapping it with "invisible()" didn't help.

An exemplary fit:

data(cholecystectomy)
fit <- glmtoolbox::glmgee(pain2 ~ treatment + age + time, family=binomial(), id=id, corstr="exchangeable", data=cholecystectomy)

tidy() / please note I added broom::: to access the internal functions working outside the package /

tidy.glmgee <- function(x, conf.int = FALSE, conf.level = .95,
                        exponentiate = FALSE, varest="robust", ...) {
  co <- stats::coef(summary(x, varest=varest))

  ret <- broom:::as_tidy_tibble(
    co,
    c("estimate", "std.error", "statistic", "p.value")[1:ncol(co)]
  )

  ret <- head(ret, -2) # remove the last 2 rows: empty + Dispersion

  if (conf.int) {
    ci <- broom:::broom_confint_terms(x, level = conf.level)
    ret <- dplyr::left_join(ret, ci, by = "term")
  }

  if (exponentiate) {
    if (is.null(x$family) ||
        (x$family$link != "logit" && x$family$link != "log")) {
      warning(paste(
        "Exponentiating coefficients, but model did not use",
        "a log or logit link function"
      ))
    }

    ret <- broom:::exponentiate(ret)
  }

  ret
}

glance() (what to do here? Could you support, please?)

glance.glmgee <- function(x, varest="robust", ...) {
  s <- summary(x, varest = varest)
  broom:::as_glance_tibble(
    df.residual = x$df.residual,
    n.clusters = x$clusters[1],
    max.cluster.size = x$clusters[2],
    alpha = list(fit1$corr), # what should I provide here? A vector?
    gamma = x$phi,
    na_types = "iiirr"
  )
}

No need to implement the confint(), it's already available.

Let's test:

> tidy(fit)

Sample size
   Number of observations:  246
       Number of clusters:  41 
             Cluster size:  6 
*************************************************************
Model
        Variance function:  binomial
            Link function:  logit
    Correlation structure:  Exchangeable
*************************************************************
Coefficients
            Estimate Std.Error z-value Pr(>|z|)
(Intercept)   -2.733     0.737  -3.709 0.000208
treatmentA     2.332     0.548   4.257 2.08e-05
age            0.033     0.014   2.290 0.022036
time           0.224     0.096   2.331 0.019750

Dispersion     0.947                           
*************************************************************
Working correlation
     [1]   [2]   [3]   [4]   [5]   [6] 
[1] 1.000 0.269 0.269 0.269 0.269 0.269
[2] 0.269 1.000 0.269 0.269 0.269 0.269
[3] 0.269 0.269 1.000 0.269 0.269 0.269
[4] 0.269 0.269 0.269 1.000 0.269 0.269
[5] 0.269 0.269 0.269 0.269 1.000 0.269
[6] 0.269 0.269 0.269 0.269 0.269 1.000
# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   -2.733     0.737    -3.709   0    
2 treatmentA     2.332     0.548     4.257   0    
3 age            0.033     0.014     2.29    0.022
4 time           0.224     0.096     2.331   0.02 
> tidy(fit, varest="robust")

Sample size
   Number of observations:  246
       Number of clusters:  41 
             Cluster size:  6 
*************************************************************
Model
        Variance function:  binomial
            Link function:  logit
    Correlation structure:  Exchangeable
*************************************************************
Coefficients
            Estimate Std.Error z-value Pr(>|z|)
(Intercept)   -2.733     0.737  -3.709 0.000208
treatmentA     2.332     0.548   4.257 2.08e-05
age            0.033     0.014   2.290 0.022036
time           0.224     0.096   2.331 0.019750

Dispersion     0.947                           
*************************************************************
Working correlation
     [1]   [2]   [3]   [4]   [5]   [6] 
[1] 1.000 0.269 0.269 0.269 0.269 0.269
[2] 0.269 1.000 0.269 0.269 0.269 0.269
[3] 0.269 0.269 1.000 0.269 0.269 0.269
[4] 0.269 0.269 0.269 1.000 0.269 0.269
[5] 0.269 0.269 0.269 0.269 1.000 0.269
[6] 0.269 0.269 0.269 0.269 0.269 1.000
# A tibble: 4 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)   -2.733     0.737    -3.709   0    
2 treatmentA     2.332     0.548     4.257   0    
3 age            0.033     0.014     2.29    0.022
4 time           0.224     0.096     2.331   0.02 
> 
> confint(fit)

 Approximate 95 percent confidence intervals based on the Wald test 

            Lower limit Upper limit
(Intercept)      -4.177      -1.289
treatmentA        1.258       3.405
age               0.005       0.061
time              0.036       0.412
> confint(fit, varest="robust")

 Approximate 95 percent confidence intervals based on the Wald test 

            Lower limit Upper limit
(Intercept)      -4.177      -1.289
treatmentA        1.258       3.405
age               0.005       0.061
time              0.036       0.412
> 
> glance(fit)

Sample size
   Number of observations:  246
       Number of clusters:  41 
             Cluster size:  6 
*************************************************************
Model
        Variance function:  binomial
            Link function:  logit
    Correlation structure:  Exchangeable
*************************************************************
Coefficients
            Estimate Std.Error z-value Pr(>|z|)
(Intercept)   -2.733     0.737  -3.709 0.000208
treatmentA     2.332     0.548   4.257 2.08e-05
age            0.033     0.014   2.290 0.022036
time           0.224     0.096   2.331 0.019750

Dispersion     0.947                           
*************************************************************
Working correlation
     [1]   [2]   [3]   [4]   [5]   [6] 
[1] 1.000 0.269 0.269 0.269 0.269 0.269
[2] 0.269 1.000 0.269 0.269 0.269 0.269
[3] 0.269 0.269 1.000 0.269 0.269 0.269
[4] 0.269 0.269 0.269 1.000 0.269 0.269
[5] 0.269 0.269 0.269 0.269 1.000 0.269
[6] 0.269 0.269 0.269 0.269 0.269 1.000
# A tibble: 1 × 5
  df.residual n.clusters max.cluster.size alpha          gamma
        <int>      <dbl>            <dbl> <list>         <dbl>
1         242         41                6 <dbl [6 × 6]> 0.9470
> glance(fit, varest = "robust")

Sample size
   Number of observations:  246
       Number of clusters:  41 
             Cluster size:  6 
*************************************************************
Model
        Variance function:  binomial
            Link function:  logit
    Correlation structure:  Exchangeable
*************************************************************
Coefficients
            Estimate Std.Error z-value Pr(>|z|)
(Intercept)   -2.733     0.737  -3.709 0.000208
treatmentA     2.332     0.548   4.257 2.08e-05
age            0.033     0.014   2.290 0.022036
time           0.224     0.096   2.331 0.019750

Dispersion     0.947                           
*************************************************************
Working correlation
     [1]   [2]   [3]   [4]   [5]   [6] 
[1] 1.000 0.269 0.269 0.269 0.269 0.269
[2] 0.269 1.000 0.269 0.269 0.269 0.269
[3] 0.269 0.269 1.000 0.269 0.269 0.269
[4] 0.269 0.269 0.269 1.000 0.269 0.269
[5] 0.269 0.269 0.269 0.269 1.000 0.269
[6] 0.269 0.269 0.269 0.269 0.269 1.000
# A tibble: 1 × 5
  df.residual n.clusters max.cluster.size alpha          gamma
        <int>      <dbl>            <dbl> <list>         <dbl>
1         242         41                6 <dbl [6 × 6]> 0.9470
simonpcouch commented 2 months ago

Thanks for the issue! broom indeed doesn't support these model objects.

Re: your issue with summary() printing out unwanted information, the capture.output() function may be helpful for you!

As of the 1.0.0 release of the package, we're not accepting new tidiers to broom, instead asking that they are implemented/maintained in the model-supplying package (i.e. glmtoolbox::).

This vignette points to some resources we've put together to help with implementing new tidiers in model-supplying packages.

github-actions[bot] commented 2 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.