Closed shinhongwu closed 6 years ago
I'm not sure. I tried it with the current recipes and things worked:
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> library(recipes)
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: broom
Attaching package: ‘recipes’
The following object is masked from ‘package:stats’:
step
> data(cox2)
> cox2 <- cox2Descr
> cox2$potency <- cox2IC50
>
> cox2_recipe <- recipe(potency ~ ., data = cox2) %>%
+ ## Log the outcome
+ step_log(potency, base = 10) %>%
+ ## Remove sparse and unbalanced predictors
+ step_nzv(all_predictors()) %>%
+ ## Surface area predictors are highly correlated so
+ ## conduct PCA just on these.
+ step_pca(contains("VSA"), prefix = "surf_area_",
+ threshold = .95) %>%
+ ## Remove other highly correlated predictors
+ step_corr(all_predictors(), -starts_with("surf_area_"),
+ threshold = .90) %>%
+ ## Center and scale all of the non-PCA predictors
+ step_center(all_predictors(), -starts_with("surf_area_")) %>%
+ step_scale(all_predictors(), -starts_with("surf_area_"))
>
> cox2_lm <- train(cox2_recipe,
+ data = cox2,
+ method = "lm",
+ trControl = trainControl(method = "cv"))
> cox2_lm
Linear Regression
462 samples
255 predictors
Recipe steps: log, nzv, pca, corr, center, scale
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 417, 416, 416, 415, 414, 416, ...
Resampling results:
RMSE Rsquared MAE
1.161915 0.3889096 0.8710393
Tuning parameter 'intercept' was held constant at a value of TRUE
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] recipes_0.1.2 broom_0.4.3 dplyr_0.7.4 caret_6.0-78 ggplot2_2.2.1 lattice_0.20-35
loaded via a namespace (and not attached):
[1] tidyselect_0.2.3 purrr_0.2.4 reshape2_1.4.3 kernlab_0.9-25 splines_3.4.3
[6] colorspace_1.3-2 stats4_3.4.3 yaml_2.1.16 survival_2.41-3 prodlim_1.6.1
[11] rlang_0.1.6.9003 ModelMetrics_1.1.0 pillar_1.1.0 withr_2.1.1 foreign_0.8-69
[16] glue_1.2.0 bindrcpp_0.2 foreach_1.4.4 bindr_0.1 plyr_1.8.4
[21] dimRed_0.1.0 lava_1.6 robustbase_0.92-8 stringr_1.2.0 timeDate_3042.101
[26] munsell_0.4.3 gtable_0.2.0 codetools_0.2-15 psych_1.7.8 parallel_3.4.3
[31] class_7.3-14 DEoptimR_1.0-8 Rcpp_0.12.15 scales_0.5.0 ipred_0.9-6
[36] CVST_0.2-1 mnormt_1.5-5 stringi_1.1.6 RcppRoll_0.2.2 ddalpha_1.3.1
[41] grid_3.4.3 tools_3.4.3 magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2
[46] tidyr_0.7.2 DRR_0.0.3 pkgconfig_2.0.1 MASS_7.3-47 Matrix_1.2-12
[51] lubridate_1.7.1 gower_0.1.2 assertthat_0.2.0 iterators_1.0.9 R6_2.2.2
[56] rpart_4.1-12 sfsmisc_1.1-1 nnet_7.3-12 nlme_3.1-131 compiler_3.4.3
Dear Max,
I am very happy to know Hadley and you had developed the new package - recipes to deal with the hideous pre-process and feature engineering for modeling. I noticed you had announced to integrate recipes packages better with caret packages in your next step from recipes depository github. Shortly, I found caret 6.0-73 has rolled out with example to use recipes object as the input in train(). I tried it this morning and got the following error: Error in train.default(cox2_recipe, data = cox2, method = "lm", trControl = trainControl(method = "cv")) : argument "y" is missing, with no default
don't know if it is the problem of my computer or is it really a bug. The following is the codes and my session info.
maurice