tidymodels / recipes

Pipeable steps for feature engineering and data preprocessing to prepare for modeling
https://recipes.tidymodels.org
Other
565 stars 111 forks source link

Long formulas cause `recipe()` to error due to C stack usage #1279

Closed mikemahoney218 closed 4 months ago

mikemahoney218 commented 8 months ago

The problem

Long formulas (with many terms; I don't believe the length of column names matters) cause recipe() to error due to C stack usage.

Apologies if this is actually an issue with hardhat or similar -- I'm not super familiar with the internals of recipes, so I didn't dig much further once I ran into this issue.

Reproducible example

df <- matrix(rnorm(100 * 131), ncol = 131)
df <- as.data.frame(df)
names(df) <- c("y", paste0("x", 1:130))
try(
  recipes::recipe(DF2formula(df), df)
)
#> Error : C stack usage  7970404 is too close to the limit

Created on 2024-01-17 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31) #> os Ubuntu 22.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2024-01-17 #> pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> class 7.3-22 2023-05-03 [1] CRAN (R 4.3.1) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.2) #> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.0) #> data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.2) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.2) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> future 1.33.1 2023-12-22 [1] CRAN (R 4.3.2) #> future.apply 1.11.1 2023-12-21 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2) #> globals 0.16.2 2022-11-21 [1] CRAN (R 4.3.2) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.2) #> gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.2) #> hardhat 1.3.0 2023-03-30 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> lattice 0.21-9 2023-10-01 [1] CRAN (R 4.3.1) #> lava 1.7.3 2023-11-04 [1] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.2) #> listenv 0.9.0 2022-12-16 [1] CRAN (R 4.3.2) #> lubridate 1.9.3 2023-09-27 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> MASS 7.3-60 2023-05-04 [1] CRAN (R 4.3.1) #> Matrix 1.6-1.1 2023-09-18 [1] CRAN (R 4.3.1) #> nnet 7.3-19 2023-05-03 [1] CRAN (R 4.3.1) #> parallelly 1.36.0 2023-05-26 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2) #> prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.2) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.2) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.2) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2) #> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.2) #> recipes 1.0.8 2023-08-25 [1] CRAN (R 4.3.2) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.2) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> rpart 4.1.19 2022-10-21 [1] CRAN (R 4.3.0) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.2) #> survival 3.5-5 2023-03-12 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.2) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.2) #> timeDate 4022.108 2023-01-07 [1] CRAN (R 4.3.2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2) #> withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.2) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.2) #> #> [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
EmilHvitfeldt commented 8 months ago

Hello!

This appears to be a duplicate of https://github.com/tidymodels/recipes/issues/548.

I'm going to keep this open, as we properly could signal a little better what went wrong

mikemahoney218 commented 8 months ago

Whoops, sorry! I searched "long formula" and didn't find that issue; figures searching "C stack" would have been more useful

github-actions[bot] commented 3 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.