tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 302 forks source link

tidy.anova fails with long predictor names (two lines?): Logical subscript `idx` must be size 1 or 2, not 3. #1171

Closed MatthieuStigler closed 1 year ago

MatthieuStigler commented 1 year ago

The problem

Running tidy on the output of car::linearHypothesis fails with error message:

Error in ret[idx, , drop = FALSE]: ! Can't subset rows with idx. ✖ Logical subscript idx must be size 1 or 2, not 3.

Possible cause

I suspect that it is due to the fact that my anova output runs over two lines, yet that the code

idx <- idx != "restricted model"
ret <- ret[idx, , drop = FALSE]

assumes only one?

Reproducible example

library(car)
#> Loading required package: carData
library(broom)

packageVersion("car")
#> [1] '3.1.2'
packageVersion("broom")
#> [1] '1.0.5'

reg_short <-  lm(Fertility~Agriculture + Examination + Education + Catholic, data=swiss)
reg_long <-  lm(Fertility~Agriculture + Examination + Education + Catholic+Infant.Mortality, data=swiss)

test_short <- linearHypothesis(reg_short, hypothesis.matrix = c(0,1, 0, 0, -1))
test_long <- linearHypothesis(reg_long, hypothesis.matrix = c(0,1, 0, 0, 0, -1))

test_short
#> Linear hypothesis test
#> 
#> Hypothesis:
#> Agriculture - Catholic = 0
#> 
#> Model 1: restricted model
#> Model 2: Fertility ~ Agriculture + Examination + Education + Catholic
#> 
#>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
#> 1     43 3438.2                                  
#> 2     42 2513.8  1    924.41 15.445 0.0003113 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
test_long
#> Linear hypothesis test
#> 
#> Hypothesis:
#> Agriculture - Infant.Mortality = 0
#> 
#> Model 1: restricted model
#> Model 2: Fertility ~ Agriculture + Examination + Education + Catholic + 
#>     Infant.Mortality
#> 
#>   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
#> 1     42 2687.6                                
#> 2     41 2105.0  1    582.57 11.347 0.001655 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

broom::tidy(test_short)
#> # A tibble: 1 × 10
#>   term   null.value estimate std.error statistic p.value df.residual   rss    df
#>   <chr>       <dbl>    <dbl>     <dbl>     <dbl>   <dbl>       <dbl> <dbl> <dbl>
#> 1 Agric…          0   -0.345    0.0878      15.4 3.11e-4          42 2514.     1
#> # ℹ 1 more variable: sumsq <dbl>
broom::tidy(test_long)
#> Error in `ret[idx, , drop = FALSE]`:
#> ! Can't subset rows with `idx`.
#> ✖ Logical subscript `idx` must be size 1 or 2, not 3.
#> Backtrace:
#>      ▆
#>   1. ├─broom::tidy(test_long)
#>   2. ├─broom:::tidy.anova(test_long)
#>   3. │ ├─ret[idx, , drop = FALSE]
#>   4. │ └─tibble:::`[.tbl_df`(ret, idx, , drop = FALSE)
#>   5. │   └─tibble:::vectbl_as_row_index(i, x, i_arg)
#>   6. │     └─tibble:::vectbl_as_row_location(i, nr, i_arg, assign, call)
#>   7. │       ├─tibble:::subclass_row_index_errors(...)
#>   8. │       │ └─base::withCallingHandlers(...)
#>   9. │       └─vctrs::vec_as_location(...)
#>  10. └─vctrs (local) `<fn>`()
#>  11.   └─vctrs:::stop_indicator_size(...)
#>  12.     └─rlang::cnd_signal(...)

Created on 2023-08-23 with reprex v2.0.2

hfrick commented 1 year ago

Thanks for the report @MatthieuStigler ! The string split on "\n" in the first line of this section is not quite the right choice for your example. I'll leave figuring out the correct regex to the maintainer of broom, @simonpcouch , when he returns.

https://github.com/tidymodels/broom/blob/a579b0dcfc9f8feedb4e937bf336478c288852cc/R/stats-anova-tidiers.R#L116-L118

simonpcouch commented 1 year ago

Thanks for the bug report!

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.