tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
101 stars 15 forks source link

Problem with a formula with spaces in the name of a factor and indicators = "none" #217

Closed erikvona closed 1 year ago

erikvona commented 1 year ago

The problem

I've had a problem when fitting a model using cuminc from the tidycmprsk package, and tracked it down to an issue in hardhat.

If you mold a formula using blueprint = hardhat::default_formula_blueprint(indicators = "none"), and the formula refers to a factor column with a space in the name, an error is generated.

Reproducible example

dataset <- data.frame(length = 1:3, food = factor(c("apple", "banana", "chocolate")))

# Works:
hardhat::mold(length ~ food, dataset, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#> $predictors
#> # A tibble: 3 x 1
#>   food     
#>   <fct>    
#> 1 apple    
#> 2 banana   
#> 3 chocolate
#> 
#> $outcomes
#> # A tibble: 3 x 1
#>   length
#>    <int>
#> 1      1
#> 2      2
#> 3      3
#> 
#> $blueprint
#> Formula blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 1 
#>    Intercept: FALSE 
#> Novel Levels: FALSE 
#>  Composition: tibble 
#>   Indicators: none 
#> 
#> $extras
#> $extras$offset
#> NULL

# But not with a space
dataset$`Favorite food` <- dataset$food

hardhat::mold(length ~ `Favorite food`, dataset, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#> Error in `validate_no_factorish_in_functions()`:
#> ! Functions involving factors or characters have been detected on the RHS of `formula`. These are not allowed when `indicators = "none"`. Functions involving factors were detected for the following columns: 'Favorite food'.

#> Backtrace:
#>     x
#>  1. +-hardhat::mold(length ~ `Favorite food`, dataset, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#>  2. \-hardhat:::mold.formula(length ~ `Favorite food`, dataset, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#>  3.   +-hardhat::run_mold(blueprint, data = data)
#>  4.   \-hardhat:::run_mold.default_formula_blueprint(blueprint, data = data)
#>  5.     \-hardhat:::mold_formula_default_process(...)
#>  6.       \-hardhat:::mold_formula_default_process_predictors(...)
#>  7.         \-hardhat:::validate_no_factorish_in_functions(formula, factorish_names)
#>  8.           \-hardhat:::glubort(...)
#>  9.             \-rlang::abort(glue(..., .sep = .sep, .envir = .envir), call = .call)

Created on 2022-11-02 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.1.2 (2021-11-01) #> os Windows 10 x64 (build 19044) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Netherlands.1252 #> ctype Dutch_Netherlands.1252 #> tz Europe/Berlin #> date 2022-11-02 #> pandoc 2.19.2 @ //vf-d3-home/d3home$/egjvonasmuth/Programs/RStudio/bin/quarto/bin/tools/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> cli 3.4.1 2022-09-23 [1] CRAN (R 4.1.3) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2) #> evaluate 0.17 2022-10-07 [1] CRAN (R 4.1.3) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.1.3) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.1.3) #> hardhat 1.2.0 2022-06-30 [1] CRAN (R 4.1.2) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.1.3) #> knitr 1.40 2022-08-24 [1] CRAN (R 4.1.3) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.1.3) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.1.3) #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.1.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.3) #> purrr 0.3.5 2022-10-06 [1] CRAN (R 4.1.3) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.1.3) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.1.3) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.1.3) #> R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.1.3) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.1.3) #> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.1.3) #> rmarkdown 2.17 2022-10-07 [1] CRAN (R 4.1.3) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.1.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) #> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.1.2) #> stringr 1.4.1 2022-08-20 [1] CRAN (R 4.1.3) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.1.3) #> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.1.3) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.4.2 2022-09-29 [1] CRAN (R 4.1.3) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.1.3) #> xfun 0.33 2022-09-12 [1] CRAN (R 4.1.3) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.1.3) #> #> [1] C:/r-checkpoint/no-checkpoint #> [2] \\vf-d3-home/d3home$/egjvonasmuth/Programs/R/R-4.1.2/library #> #> ------------------------------------------------------------------------------ ```
DavisVaughan commented 1 year ago

I think this is really a bug in base R but I'll see if we can work around it

terms <- terms(~ `Favorite food`)
terms <- attr(terms, "factors")
terms
#>                 `Favorite food`
#> `Favorite food`               1

# the actual name shouldn't be backticked here
colnames(terms)
#> [1] "`Favorite food`"
github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.