tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.46k stars 304 forks source link

Fix coeftest bug #1227

Closed jsr-p closed 1 month ago

jsr-p commented 1 month ago

When using tidy with coeftest on a model with a constant terms nan are produced by the confint function. The reason is that the intercept rowname does not pass through. The error is inside the broom_confint_terms function. When hitting a breakpoint we can inspect the x input variable to the dataframe. As shown below names(coef(x)) returns NULL so no rowname will be set and when it is larger merged onto the other variables the nans are produced. This PR checks for a one-dimensional object of class coeftest and uses the rownames function on the x variable to get the correct intercept name onto the resulting ci object.


Browse[2]> x

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.1039     4.4520 -0.6972    0.486

Browse[2]> class(x)

[1] "coeftest"
Browse[2]> is.null(dim(ci))

[1] FALSE
Browse[2]> coef(x)

[1] -3.103912
Browse[2]> names(coef(x))

NULL
Browse[2]>

Output when using broom package

# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10      4.45    -0.697   0.486       NA        NA
2     2 (Intercept)    -6.34      4.62    -1.37    0.170       NA        NA
[1] "Second case"
# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic  p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10     0.584     -5.32 1.58e- 7    -4.25     -1.96
2     2 (Intercept)    -6.34     0.594    -10.7  3.97e-24    -7.51     -5.18
[1] "Third case"
# A tibble: 4 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic   p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -2.97    4.42      -0.671 5.02e-  1   -11.7       5.72
2     1 x               2.05    0.0822    25.0   6.88e- 90     1.89      2.21
3     2 (Intercept)    -5.93    4.39      -1.35  1.78e-  1   -14.6       2.70
4     2 x               2.14    0.0383    56.0   3.85e-217     2.07      2.22

SessionInfo

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Copenhagen
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sandwich_3.1-0 readr_2.1.5    purrr_1.0.2    tidyr_1.3.1    dplyr_1.1.4
[6] lmtest_0.9-40  zoo_1.8-12     broom_1.0.5

loaded via a namespace (and not attached):
 [1] backports_1.4.1  utf8_1.2.4       R6_2.5.1         tidyselect_1.2.1
 [5] lattice_0.22-6   tzdb_0.4.0       magrittr_2.0.3   glue_1.7.0
 [9] tibble_3.2.1     pkgconfig_2.0.3  generics_0.1.3   lifecycle_1.0.4
[13] cli_3.6.3        fansi_1.0.6      grid_4.4.1       vctrs_0.6.5
[17] compiler_4.4.1   hms_1.1.3        pillar_1.9.0     rlang_1.1.4

Output when using pull request

# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10      4.45    -0.697   0.486    -11.9      5.64
2     2 (Intercept)    -6.34      4.62    -1.37    0.170    -15.4      2.73
[1] "Second case"
# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic  p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10     0.584     -5.32 1.58e- 7    -4.25     -1.96
2     2 (Intercept)    -6.34     0.594    -10.7  3.97e-24    -7.51     -5.18
[1] "Third case"
# A tibble: 4 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic   p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -2.97    4.42      -0.671 5.02e-  1   -11.7       5.72
2     1 x               2.05    0.0822    25.0   6.88e- 90     1.89      2.21
3     2 (Intercept)    -5.93    4.39      -1.35  1.78e-  1   -14.6       2.70
4     2 x               2.14    0.0383    56.0   3.85e-217     2.07      2.22

Test script

# devtools::load_all("/home/jsr-p/gh-repos/forks/broom")
library(broom)
library(lmtest)
library(dplyr)
library(tidyr)
library(purrr)
library(readr)
library(sandwich)

sessionInfo()

set.seed(999)
df <- data.frame(
  id = rep(1:10, each = 100),
  cs = rep(1:10, times = 100),
  cat = rep(1:2, times = 500),
  x = 4 * rnorm(1000)
)
df <- df |>
  mutate(
    y = 1 + 2 * x + 2 * as.numeric(as.factor(id)) - 3 * as.numeric(as.factor(cs)) + rnorm(1000)
  )

robust_se <- function(x) {
  tidy(
    coeftest(
      x,
      vcovCL(x, cluster = ~ id + cs)
    ),
    conf.int = TRUE
  )
}

df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1, data = .))) |>
  mutate(
    tidy = map(model, robust_se)
  ) |>
  select(!c("data", "model")) |>
  unnest(tidy)

print("Second case")
df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1, data = .))) |>
  mutate(tidy = map(model, \(x) tidy(x, conf.int = TRUE))) |>
  select(!c("data", "model")) |>
  unnest(tidy)

print("Third case")
df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1 + x, data = .))) |>
  mutate(
    tidy = map(model, robust_se)
  ) |>
  select(!c("data", "model")) |>
  unnest(tidy)
jsr-p commented 1 month ago

Thank you!

Could you also add a test in tests/testthat/test-lmtest.R for this case and note the change in NEWS?

Thanks for the code review! I have added a test here, updated the news and moved the fix inside tidy.coeftest.

simonpcouch commented 1 month ago

Thank you for the revisions! I just pushed a few small edits—will merge once checks pass. I appreciate your work here!

jsr-p commented 1 month ago

@simonpcouch thank you too! 😄

github-actions[bot] commented 1 month ago

This pull request has been automatically locked. If you believe the issue addressed here persists, please file a new PR (with a reprex: https://reprex.tidyverse.org) and link to this one.