tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 304 forks source link

Unexpected behaviour in augment() for person residuals. #1147

Closed egosv closed 1 year ago

egosv commented 1 year ago

The problem

I was trying the augment(fit) function using type.residuals = "pearson" and found the output of column .std.resid to be different to that of rstandard(fit, type = "pearson").

After some digging, it seems the issue is introduced by function add_hat_sigma_cols https://github.com/tidymodels/broom/blob/60da694c2f91d223096eb003f354105838f93a74/R/utilities.R#L374 which essentially overwrites the previously (correctly) calculated values.

This issue is inconsequential for family = gaussian but has some implications for any other family member and that's probably why simple testing may have passed.

So, in essence, column .std.resid always contains standardized deviance residuals irrespective of the argument type.residuals, which I believe is not the expected behaviour.

Reproducible example

## an example with offsets from Venables & Ripley (2002, p.189)
utils::data(anorexia, package = "MASS")

anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                family = gaussian, data = anorexia)

rs1 = unlist(broom::augment(anorex.1, type.residuals = "pearson")[,c(".std.resid")])
rs2 = rstandard(anorex.1, type = "pearson")
d1 = rstandard(anorex.1, type = "deviance")

all.equal(rs1, rs2, check.attributes = FALSE)
#> [1] TRUE
all.equal(rs1, d1, check.attributes = FALSE)
#> [1] TRUE

## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())

rs1 = unlist(broom::augment(glm.D93, type.residuals = "pearson")[,c(".std.resid")])
rs2 = rstandard(glm.D93, type = "pearson")
d1 = rstandard(glm.D93, type = "deviance")

all.equal(rs1, rs2, check.attributes = FALSE)
#> [1] "Mean relative difference: 0.03595586"
all.equal(rs1, d1, check.attributes = FALSE)
#> [1] TRUE

# A Gamma example, from McCullagh & Nelder (1989, pp. 300-2)
clotting <- data.frame(
  u = c(5,10,15,20,30,40,60,80,100),
  lot1 = c(118,58,42,35,27,25,21,19,18),
  lot2 = c(69,35,26,21,18,16,13,12,12))
fitGamma <- glm(lot1 ~ log(u), data = clotting, family = Gamma)

rs1 = unlist(broom::augment(fitGamma, type.residuals = "pearson")[,c(".std.resid")])
rs2 = rstandard(fitGamma, type = "pearson")
d1 = rstandard(fitGamma, type = "deviance")

all.equal(rs1, rs2, check.attributes = FALSE)
#> [1] "Mean relative difference: 0.01597699"
all.equal(rs1, d1, check.attributes = FALSE)
#> [1] TRUE

Created on 2023-03-07 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.2 (2021-11-01) #> os macOS Monterey 12.6.3 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Toronto #> date 2023-03-07 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.0 2021-11-23 [1] CRAN (R 4.1.2) #> broom 0.7.12 2022-01-28 [1] CRAN (R 4.1.1) #> cli 3.4.1 2022-09-23 [1] CRAN (R 4.1.1) #> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.1.1) #> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.1) #> dplyr 1.0.10 2022-09-01 [1] CRAN (R 4.1.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.1) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.1.1) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.1.1) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.1.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.1) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.1.1) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.1.1) #> rmarkdown 2.17 2022-10-07 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.1) #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1) #> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.1.1) #> tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.1.1) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.5.0 2022-10-22 [1] CRAN (R 4.1.1) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.1) #> xfun 0.34 2022-10-18 [1] CRAN (R 4.1.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> #> ────────────────────────────────────────────────────────────────────────────── ```
simonpcouch commented 1 year ago

Thanks for the detailed reprex! Agreed that this is an issue. I missed this in #965.

Rather than touching that helper, I may just overwrite its std.resid results in augment.glm()augment.rlm() is the other function that uses this helper and doesn't take a type.residuals argument:

https://github.com/tidymodels/broom/blob/dc14b1005ede396d50409071f70d16899fe64fd9/R/mass-rlm-tidiers.R#L90-L98

egosv commented 1 year ago

Thanks, yes, that will likely fix this.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.