larmarange / broom.helpers

A set of functions to facilitate manipulation of tibbles produced by broom
https://larmarange.github.io/broom.helpers/
GNU General Public License v3.0
21 stars 8 forks source link

show factor levels in `tidy_add_variable_labels`? #222

Closed aghaynes closed 1 year ago

aghaynes commented 1 year ago

When you have factors in the model, you need to know which level of the factor the specific term refers too... Would it be possible to extract the factor level (perhaps from term) and put it in a level_label variable or something like that?

Something like...?

m <- lm(mpg ~ f_cyl, mtcars)
tidy_and_attach(m) |> 
  tidy_add_variable_labels() |> 
  mutate(level_label = if_else(var_type %in% c("categorical", "dichotomous"), 
                               stringr::str_replace(term, variable, ""), "")) |> 
  select(term:var_label, var_type, level_label)
# A tibble: 3 × 5
  term        variable    var_label   var_type    level_label
  <chr>       <chr>       <chr>       <chr>       <chr>      
1 (Intercept) (Intercept) (Intercept) intercept   ""         
2 f_cyl6      f_cyl       N cylinders categorical "6"        
3 f_cyl8      f_cyl       N cylinders categorical "8"        

This would certainly be useful in tidy_add_variable_labels. I'm not sure about other functions...

larmarange commented 1 year ago

The purpose of tidy_add_variable_labels() is and only is to add the appropriate variable labels. If it was not called before in the pipeline, tidy_add_variable_labels() calls tidy_identify_variables() whose purpose is to identify the corresponding variable of each term.

Regarding extracting the factor level, it is the purpose of tidy_add_term_labels().

You also have tidy_add_reference_rows() to add a row with the reference level.

In broom.helpers, each single operation is done by a different function, and all these functions can be combined in a pipeline.

Most users do not need these individual functions and would rather use tidy_plus_plus() which combines all features of broom.helpers in one function (see documentation for the full list of arguments).

See examples below

I'm closing the issue as the feature you refer to is already implemented. But feel free to reopen if you identify something missing.

library(broom.helpers)
mtcars$f_cyl <- factor(mtcars$cyl)
m <- lm(mpg ~ f_cyl, mtcars)

tidy_and_attach(m) |> 
  tidy_add_variable_labels() 
#> # A tibble: 3 × 12
#>   term      variable var_label var_class var_type var_nlevels estimate std.error
#>   <chr>     <chr>    <chr>     <chr>     <chr>          <int>    <dbl>     <dbl>
#> 1 (Interce… (Interc… (Interce… <NA>      interce…          NA    26.7      0.972
#> 2 f_cyl6    f_cyl    f_cyl     factor    categor…           3    -6.92     1.56 
#> 3 f_cyl8    f_cyl    f_cyl     factor    categor…           3   -11.6      1.30 
#> # ℹ 4 more variables: statistic <dbl>, p.value <dbl>, conf.low <dbl>,
#> #   conf.high <dbl>

tidy_and_attach(m) |> 
  tidy_add_term_labels()
#> # A tibble: 3 × 15
#>   term        variable    var_label   var_class var_type   var_nlevels contrasts
#>   <chr>       <chr>       <chr>       <chr>     <chr>            <int> <chr>    
#> 1 (Intercept) (Intercept) (Intercept) <NA>      intercept           NA <NA>     
#> 2 f_cyl6      f_cyl       f_cyl       factor    categoric…           3 contr.tr…
#> 3 f_cyl8      f_cyl       f_cyl       factor    categoric…           3 contr.tr…
#> # ℹ 8 more variables: contrasts_type <chr>, label <chr>, estimate <dbl>,
#> #   std.error <dbl>, statistic <dbl>, p.value <dbl>, conf.low <dbl>,
#> #   conf.high <dbl>

tidy_and_attach(m) |> 
  tidy_add_reference_rows() |> 
  tidy_add_term_labels()
#> # A tibble: 4 × 16
#>   term        variable    var_label   var_class var_type   var_nlevels contrasts
#>   <chr>       <chr>       <chr>       <chr>     <chr>            <int> <chr>    
#> 1 (Intercept) (Intercept) (Intercept) <NA>      intercept           NA <NA>     
#> 2 f_cyl4      f_cyl       f_cyl       factor    categoric…           3 contr.tr…
#> 3 f_cyl6      f_cyl       f_cyl       factor    categoric…           3 contr.tr…
#> 4 f_cyl8      f_cyl       f_cyl       factor    categoric…           3 contr.tr…
#> # ℹ 9 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   estimate <dbl>, std.error <dbl>, statistic <dbl>, p.value <dbl>,
#> #   conf.low <dbl>, conf.high <dbl>

tidy_plus_plus(m)
#> # A tibble: 3 × 17
#>   term   variable var_label var_class var_type    var_nlevels contrasts      
#>   <chr>  <chr>    <chr>     <chr>     <chr>             <int> <chr>          
#> 1 f_cyl4 f_cyl    f_cyl     factor    categorical           3 contr.treatment
#> 2 f_cyl6 f_cyl    f_cyl     factor    categorical           3 contr.treatment
#> 3 f_cyl8 f_cyl    f_cyl     factor    categorical           3 contr.treatment
#> # ℹ 10 more variables: contrasts_type <chr>, reference_row <lgl>, label <chr>,
#> #   n_obs <dbl>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>, conf.low <dbl>, conf.high <dbl>

Created on 2023-03-31 with reprex v2.0.2

larmarange commented 1 year ago

Same examples but with full tables visible. The levels are visible in the label column.

library(broom.helpers)
mtcars$f_cyl <- factor(mtcars$cyl)
m <- lm(mpg ~ f_cyl, mtcars)

tidy_and_attach(m) |> 
  tidy_add_variable_labels() |> 
  knitr::kable()
term variable var_label var_class var_type var_nlevels estimate std.error statistic p.value conf.low conf.high
(Intercept) (Intercept) (Intercept) NA intercept NA 26.663636 0.9718008 27.437347 0.0000000 24.67608 28.651192
f_cyl6 f_cyl f_cyl factor categorical 3 -6.920779 1.5583482 -4.441099 0.0001195 -10.10796 -3.733599
f_cyl8 f_cyl f_cyl factor categorical 3 -11.563636 1.2986235 -8.904533 0.0000000 -14.21962 -8.907653

tidy_and_attach(m) |> 
  tidy_add_term_labels() |> 
  knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type label estimate std.error statistic p.value conf.low conf.high
(Intercept) (Intercept) (Intercept) NA intercept NA NA NA (Intercept) 26.663636 0.9718008 27.437347 0.0000000 24.67608 28.651192
f_cyl6 f_cyl f_cyl factor categorical 3 contr.treatment treatment 6 -6.920779 1.5583482 -4.441099 0.0001195 -10.10796 -3.733599
f_cyl8 f_cyl f_cyl factor categorical 3 contr.treatment treatment 8 -11.563636 1.2986235 -8.904533 0.0000000 -14.21962 -8.907653

tidy_and_attach(m) |> 
  tidy_add_reference_rows() |> 
  tidy_add_term_labels() |> 
  knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label estimate std.error statistic p.value conf.low conf.high
(Intercept) (Intercept) (Intercept) NA intercept NA NA NA NA (Intercept) 26.663636 0.9718008 27.437347 0.0000000 24.67608 28.651192
f_cyl4 f_cyl f_cyl factor categorical 3 contr.treatment treatment TRUE 4 NA NA NA NA NA NA
f_cyl6 f_cyl f_cyl factor categorical 3 contr.treatment treatment FALSE 6 -6.920779 1.5583482 -4.441099 0.0001195 -10.10796 -3.733599
f_cyl8 f_cyl f_cyl factor categorical 3 contr.treatment treatment FALSE 8 -11.563636 1.2986235 -8.904533 0.0000000 -14.21962 -8.907653

tidy_plus_plus(m) |> 
  knitr::kable()
term variable var_label var_class var_type var_nlevels contrasts contrasts_type reference_row label n_obs estimate std.error statistic p.value conf.low conf.high
f_cyl4 f_cyl f_cyl factor categorical 3 contr.treatment treatment TRUE 4 11 0.000000 NA NA NA NA NA
f_cyl6 f_cyl f_cyl factor categorical 3 contr.treatment treatment FALSE 6 7 -6.920779 1.558348 -4.441099 0.0001195 -10.10796 -3.733599
f_cyl8 f_cyl f_cyl factor categorical 3 contr.treatment treatment FALSE 8 14 -11.563636 1.298623 -8.904533 0.0000000 -14.21962 -8.907653

Created on 2023-03-31 with reprex v2.0.2

aghaynes commented 1 year ago

perfect, thanks!