tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 302 forks source link

Added adj.r.squared and npar as outputs of glance.gam for mgcv::gam #1172

Closed tripartio closed 1 year ago

tripartio commented 1 year ago

When using glance.gam for mgcv::gam, I was unpleasantly surprised to not find any results for adjusted R squared. This is one of the most important model evaluation statistics for statistical inference. Not only my colleagues and I, but also probably most other users of glance.gam would need this.

On examining the code, it seems that the only outputs provided by the current glance method are those directly available from the gam model object. But summary.gam provides some additional valuable outputs. So, I have modified the code to calculate the summary(x) on the model object (x) and then add in the other useful outputs (at least, those, that output a scalar numeric output.

That said, when examining modeltests::column_glossary, I could not find most of the additional outputs. So, I have added the only two outputs available in the modeltests::column_glossary (adj.r.squared and npar); for the other outputs, I have listed them in the code but commented them out--that way, if they are supported in the future in modeltests::column_glossary, the code can more easily be updated.

I have tested the updates with the unit tests in tests/testthat/test-mgcv.R; my modifications pass the tests. I hope you can accept this pull request.

simonpcouch commented 1 year ago

Thanks for the PR! This looks solid.

Wanted to make sure we wouldn't be increasing the time-to-tidy too drastically by introducing the additional summary() call, but looks like that's not an issue:

library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.9-0. For overview type 'help("mgcv-package")'.
library(broom)

set.seed(2) ## simulate some data... 
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)
#> Gu & Wahba 4 term additive model

b <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat)

bench::mark(
  gam = gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat),
  tidy = tidy(b),
  summary = summary(b),
  check = FALSE,
  relative = TRUE
)
#> # A tibble: 3 × 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 gam        36.1   36.1       1         34.9     1   
#> 2 tidy        4.62   4.57      7.78      32.8     2.78
#> 3 summary     1      1        35.8        1       2.67

Created on 2023-09-06 with reprex v2.0.2

Pushing some small changes before merging.

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe the issue addressed here persists, please file a new PR (with a reprex: https://reprex.tidyverse.org) and link to this one.