tidymodels / themis

Extra recipes steps for dealing with unbalanced data
https://themis.tidymodels.org/
Other
141 stars 11 forks source link

ROSE throws error for categorical variables #73

Closed emilyriederer closed 2 years ago

emilyriederer commented 2 years ago

The problem

Hi - thanks for the great package! Following the example in #27, I am trying to use step_rose() with categorical data with the current release version of themis. However, I get an error rerunning the reprex provided there, and the error message implies that step_rose() can only be used on numeric data.

Reproducible example

library(themis)
#> Warning: package 'themis' was built under R version 4.0.5
#> Loading required package: recipes
#> Warning: package 'recipes' was built under R version 4.0.5
#> Loading required package: dplyr
#> Warning: package 'dplyr' was built under R version 4.0.5
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
#> Registered S3 methods overwritten by 'themis':
#>   method                  from   
#>   bake.step_downsample    recipes
#>   bake.step_upsample      recipes
#>   prep.step_downsample    recipes
#>   prep.step_upsample      recipes
#>   tidy.step_downsample    recipes
#>   tidy.step_upsample      recipes
#>   tunable.step_downsample recipes
#>   tunable.step_upsample   recipes
#> 
#> Attaching package: 'themis'
#> The following objects are masked from 'package:recipes':
#> 
#>     step_downsample, step_upsample
library(palmerpenguins)

pen1 <- penguins %>%
  mutate(island = factor(island == "Torgersen"))

recipe(island ~ ., data = pen1) %>%
  step_naomit(all_predictors()) %>%
  step_rose(island) %>%
  prep() %>%
  bake(new_data = NULL)
#> Error: All columns selected for the step should be numeric

Created on 2021-12-18 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> Error in get(genname, envir = envir) : object 'testthat_print' not found #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.2 (2020-06-22) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/Chicago #> date 2021-12-18 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) #> backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0) #> BBmisc 1.11 2017-03-10 [1] CRAN (R 4.0.5) #> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2) #> checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.2) #> class 7.3-17 2020-04-26 [2] CRAN (R 4.0.2) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.0.5) #> codetools 0.2-16 2018-12-24 [2] CRAN (R 4.0.2) #> colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2) #> data.table 1.13.0 2020-07-24 [1] CRAN (R 4.0.2) #> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.3) #> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> doParallel 1.0.16 2020-10-16 [1] CRAN (R 4.0.5) #> dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.0.5) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2) #> fastmatch 1.1-0 2017-01-28 [1] CRAN (R 4.0.3) #> FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.3) #> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.0.5) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.5) #> gower 0.2.2 2020-06-23 [1] CRAN (R 4.0.3) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.5) #> ipred 0.9-12 2021-09-15 [1] CRAN (R 4.0.5) #> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3) #> knitr 1.33.8 2021-08-08 [1] Github (yihui/knitr@55a2df9) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.2) #> lava 1.6.8.1 2020-11-04 [1] CRAN (R 4.0.3) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.5) #> lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> MASS 7.3-51.6 2020-04-26 [2] CRAN (R 4.0.2) #> Matrix 1.2-18 2019-11-27 [2] CRAN (R 4.0.2) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2) #> mlr 2.19.0 2021-02-22 [1] CRAN (R 4.0.5) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) #> nnet 7.3-14 2020-04-26 [2] CRAN (R 4.0.2) #> palmerpenguins * 0.1.0 2020-07-23 [1] CRAN (R 4.0.2) #> parallelMap 1.5.0 2020-03-26 [1] CRAN (R 4.0.5) #> ParamHelpers 1.14 2020-03-24 [1] CRAN (R 4.0.5) #> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.0.5) #> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2) #> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2) #> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.0.3) #> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5) #> RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.5) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.0.5) #> recipes * 0.1.17 2021-09-27 [1] CRAN (R 4.0.5) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.5) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.0.5) #> ROSE 0.0-4 2021-06-14 [1] CRAN (R 4.0.5) #> rpart 4.1-15 2019-04-12 [2] CRAN (R 4.0.2) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> survival 3.1-12 2020-04-10 [2] CRAN (R 4.0.2) #> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2) #> themis * 0.1.4 2021-06-12 [1] CRAN (R 4.0.5) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.0.5) #> tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.0.5) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.5) #> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.3) #> unbalanced 2.0 2015-06-26 [1] CRAN (R 4.0.5) #> usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.5) #> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.2) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5) #> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.5) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] C:/Users/emily/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.2/library ```
EmilHvitfeldt commented 2 years ago

Hello @emilyriederer, This has been fixed in the developmental of {themis} 😄

library(themis)
library(palmerpenguins)

pen1 <- penguins %>%
  mutate(island = factor(island == "Torgersen"))

recipe(island ~ ., data = pen1) %>%
  step_naomit(all_predictors()) %>%
  step_rose(island) %>%
  prep() %>%
  bake(new_data = NULL)
#> # A tibble: 572 × 8
#>    species bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex    year
#>    <fct>            <dbl>         <dbl>            <dbl>       <dbl> <fct> <dbl>
#>  1 Gentoo            50.3          14.8             218.       6740. male  2007.
#>  2 Chinst…           46.0          16.7             193.       3067. fema… 2008.
#>  3 Gentoo            45.2          15.1             208.       5336. fema… 2008.
#>  4 Adelie            41.7          17.5             185.       3537. male  2007.
#>  5 Gentoo            42.7          13.6             217.       4094. fema… 2007.
#>  6 Adelie            32.3          19.1             172.       4324. male  2008.
#>  7 Chinst…           45.2          19.0             194.       3898. fema… 2009.
#>  8 Adelie            41.1          20.0             181.       4811. male  2009.
#>  9 Chinst…           46.1          17.6             196.       4016. fema… 2008.
#> 10 Adelie            26.2          18.2             177.       3121. fema… 2008.
#> # … with 562 more rows, and 1 more variable: island <fct>

Created on 2021-12-18 by the reprex package (v2.0.1)

emilyriederer commented 2 years ago

😳 My mistake! I think I misread the timing on the last issue. Sorry about that, and thank you so much for the fast reply in spite of my error!

EmilHvitfeldt commented 2 years ago

No worries, {themis} is overdue for a CRAN update. I'll take some of the blame!

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.