Open andreranza opened 1 year ago
Hello @andreranza :wave: Thanks for the wonderful reprex!
As per the documentation for step_impute_knn.
You need to use the imp_vars()
function to use selector functions such as has_role()
. I want to be able to use has_role()
directly in cases like this but it is not yet implemented.
library(recipes)
df <- tibble::tibble(
country_code = c("AGO", "BGD", "BRA", "CHN", "PRK"),
GDP = c(6930.7687, 35263.802, 8159000.64, 8485748, 9868.7669),
D = c(32353588, 165516222, 211782878, 1407745000, 25755441),
A = c(167, 1136, 2463, 2951, 367),
B = c(3, NA, 5, NA, 7),
C = c(13, NA, 5, NA, 4)
)
recipe(GDP ~ ., data = df) |>
add_role(D, new_role = "impute") |>
add_role(A, new_role = "impute") |>
step_impute_knn(
c("B", "C"),
neighbors = 2,
impute_with = imp_vars(has_role("impute"))
) |>
prep() |>
juice()
#> # A tibble: 5 × 6
#> country_code D A B C GDP
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AGO 32353588 167 3 13 6931.
#> 2 BGD 165516222 1136 5 8.5 35264.
#> 3 BRA 211782878 2463 5 5 8159001.
#> 4 CHN 1407745000 2951 6 4.5 8485748
#> 5 PRK 25755441 367 7 4 9869.
Created on 2023-09-07 with reprex v2.0.2
Wow, I definitely saw imp_vars(). Unsure why I didn't try that out 😅 I guess it felt so natural to use it without that it should have worked despite what the documentation was saying. Sorry and thanks a lot for pointing in the right direction!
The problem
I'm having trouble selecting columns to impute within
step_impute_knn()
usinghas_role()
. Thanks!Reproducible example
Created on 2023-09-07 with reprex v2.0.2
Session info
``` r sessionInfo() #> R version 4.2.3 (2023-03-15) #> Platform: x86_64-apple-darwin17.0 (64-bit) #> Running under: macOS Big Sur ... 10.16 #> #> Matrix products: default #> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib #> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib #> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] recipes_1.0.8 dplyr_1.1.0 #> #> loaded via a namespace (and not attached): #> [1] styler_1.7.0 tidyselect_1.2.0 xfun_0.39 #> [4] purrr_1.0.1 listenv_0.9.0 splines_4.2.3 #> [7] lattice_0.20-45 vctrs_0.6.3 generics_0.1.3 #> [10] htmltools_0.5.4 yaml_2.3.7 utf8_1.2.3 #> [13] survival_3.5-3 prodlim_2023.08.28 rlang_1.1.1 #> [16] R.oo_1.25.0 pillar_1.9.0 glue_1.6.2 #> [19] withr_2.5.0 R.utils_2.12.0 R.cache_0.16.0 #> [22] lifecycle_1.0.3 lava_1.7.2.1 timeDate_4022.108 #> [25] R.methodsS3_1.8.2 future_1.33.0 codetools_0.2-19 #> [28] evaluate_0.21 knitr_1.43 fastmap_1.1.1 #> [31] parallel_4.2.3 class_7.3-21 fansi_1.0.4 #> [34] Rcpp_1.0.10 ipred_0.9-14 parallelly_1.36.0 #> [37] fs_1.6.2 digest_0.6.33 grid_4.2.3 #> [40] hardhat_1.3.0 cli_3.6.1 tools_4.2.3 #> [43] magrittr_2.0.3 tibble_3.2.1 future.apply_1.11.0 #> [46] pkgconfig_2.0.3 ellipsis_0.3.2 MASS_7.3-58.2 #> [49] Matrix_1.5-3 data.table_1.14.8 timechange_0.2.0 #> [52] lubridate_1.9.2 reprex_2.0.2 gower_1.0.1 #> [55] rmarkdown_2.23 rstudioapi_0.15.0 R6_2.5.1 #> [58] globals_0.16.2 rpart_4.1.19 nnet_7.3-18 #> [61] compiler_4.2.3 ```