topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

Some configurations of `train()` require the caret namespace to be attached or they will fail #1242

Open mjskay opened 3 years ago

mjskay commented 3 years ago

Thanks for the wonderful package! We've been using caret to implement an algorithm for the rstar() function in the posterior package. This function internally fits a model using caret::train() in order to calculate an MCMC diagnostic.

We've run into an issue: we would not like our users to have to load/attach the {caret} namespace just to use the rstar() function. However, we have found some configurations of caret::train() we would like to use (particularly when method = "knn") appear to require the {caret} namespace to be attached or they will fail. We were wondering if this is a bug, or if there is something more fundamental about the setup of {caret} such that it is only intended to work if the user has attached its namespace.

I provide an example and some more explanation below.

Minimal, reproducible example:

Minimal dataset:

Just a simple dataset with one continuous predictor (x) and a response variable with two classes (y):

df = data.frame(
  x = rnorm(100, c(0,1)),
  y = factor(1:2)
)

Minimal, runnable code:

NOTE: To see the problem this must be run without {caret} loaded into the namespace:

print(.packages())
## [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"  

Here's a minimal example of the problem. I provide some other investigation into the issue afterwards in case it is helpful.

Assuming caret is not loaded, run the following code:

df = data.frame(x = rnorm(100, c(0,1)), y = factor(1:2))
caret::train(
  y ~ x, data = df, method = "knn",
  trControl = caret::trainControl(method = "none")
)
#> Error in knn3(as.matrix(x), y, k = param$k, ...): could not find function "knn3"
#> Timing stopped at: 0 0 0

This problem does not happen if:

Here is the output if we use the default value of trControl; which works (with a caveat, see below):

df = data.frame(x = rnorm(100, c(0,1)), y = factor(1:2))
caret::train(
  y ~ x, data = df, method = "knn"
)
#> Loading required package: lattice
#> Loading required package: ggplot2
#> k-Nearest Neighbors 
#> 
#> 100 samples
#>   1 predictor
#>   2 classes: '1', '2' 
#> 
#> No pre-processing
#> Resampling: Bootstrapped (25 reps) 
#> Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 
#> Resampling results across tuning parameters:
#> 
#>   k  Accuracy   Kappa    
#>   5  0.6316994  0.2640919
#>   7  0.6669488  0.3363986
#>   9  0.6823583  0.3652040
#> 
#> Accuracy was used to select the optimal model using the largest value.
#> The final value used for the model was k = 9.

The caveat is, this appears to have worked because {caret} has loaded and attached its own namespace (as well as the namespaces of its two hard dependencies, ggplot2 and lattice), as we can see if I now check attached packages:

print(.packages())
## [1] "caret"     "ggplot2"   "lattice"   "stats"     "graphics"  "grDevices" "utils"    
## [8] "datasets"  "methods"   "base" 

Currently, we are working around this by throwing a warning and asking users to run library(caret) manually (see https://github.com/stan-dev/posterior/issues/164). However, ideally we would not like to require users to attach caret, ggplot2, and lattice to their namespace in order to call the rstar() function.

Any help would be much appreciated. Thanks!

Session Info:

```r >sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] caret_6.0-88 ggplot2_3.3.5 lattice_0.20-44 loaded via a namespace (and not attached): [1] Rcpp_1.0.6 lubridate_1.7.10 ps_1.6.0 class_7.3-19 [5] assertthat_0.2.1 digest_0.6.27 ipred_0.9-11 foreach_1.5.1 [9] utf8_1.2.1 R6_2.5.0 plyr_1.8.6 reprex_2.0.0 [13] stats4_4.1.0 evaluate_0.14 e1071_1.7-7 highr_0.9 [17] pillar_1.6.1 rlang_0.4.11 rstudioapi_0.13 data.table_1.14.0 [21] callr_3.7.0 rpart_4.1-15 Matrix_1.3-4 rmarkdown_2.9 [25] splines_4.1.0 gower_0.2.2 stringr_1.4.0 munsell_0.5.0 [29] proxy_0.4-26 compiler_4.1.0 xfun_0.24 pkgconfig_2.0.3 [33] clipr_0.7.1 htmltools_0.5.1.1 nnet_7.3-16 tidyselect_1.1.1 [37] tibble_3.1.2 prodlim_2019.11.13 codetools_0.2-18 fansi_0.5.0 [41] crayon_1.4.1 dplyr_1.0.7 withr_2.4.2 MASS_7.3-54 [45] recipes_0.1.16 ModelMetrics_1.2.2.2 grid_4.1.0 nlme_3.1-152 [49] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1 [53] pROC_1.17.0.1 scales_1.1.1 cli_2.5.0 stringi_1.6.2 [57] reshape2_1.4.4 fs_1.5.0 timeDate_3043.102 ellipsis_0.3.2 [61] generics_0.1.0 vctrs_0.3.8 lava_1.6.9 iterators_1.0.13 [65] tools_4.1.0 glue_1.4.2 purrr_0.3.4 processx_3.5.2 [69] survival_3.2-11 yaml_2.2.1 colorspace_2.0-2 knitr_1.33 ```
JianGuoZhou3 commented 3 years ago

Thx

mjskay commented 3 years ago

I just wanted to swing back around here to say: we were able to work around this by manually attaching and detaching the {caret} namespace and those of its hard dependencies (ggplot2, lattice). If anyone else encounters this issue and is curious, our workaround is here: https://github.com/stan-dev/posterior/pull/181.

So while it might still technically be an issue, we no longer need a solution for our purposes --- so please feel free to close this if you want :). Thanks!