train.default's predictions are not consistent with the returned model's predictions

train.default loops many time to find out which parameter combinations are the most effective, but then discards all the created models. I am assuming the models are discarded to save memory. This is an issue when certain models have another randomness components besides hyperparameters such as the random forest. Even with the same initialization we can see a drastic change in performance.

Below, the initial model was 53% correct, but another model with the same parameter combination was 100% correct. In this case we see a dramatic increase performance, but models could also potentially perform much worse.

Minimal dataset:

The minimal dataset that I am using is the pima_diabetes dataset. It is included in the healthcareai package. You can download this package from cran.

install.packages("healthcareai")

Minimal, runnable code:

library(healthcareai)
#> healthcareai version 2.2.0
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
library(tidyverse)

d <- prep_data(pima_diabetes, outcome = diabetes)
#> Training new data prep recipe...
d_features <- select(d, -diabetes)
d_outcomes <- pull(d, diabetes)

train_control <- caret::trainControl(method = "cv",
                                     number = 5,
                                     search = "grid",
                                     savePredictions = "final")
train_control$summaryFunction <- caret::twoClassSummary
train_control$classProbs <- TRUE
tune_grid <- data.frame(
  mtry = 3,
  splitrule = "extratrees",
  min.node.size = 1
)

train_list <- caret::train(x = d_features, y = d_outcomes, method = "ranger", 
                           metric = "ROC", trControl = train_control, tuneGrid = tune_grid,
                           importance = "impurity")
#> Loading required package: lattice
#> 
#> Attaching package: 'caret'
#> The following object is masked from 'package:purrr':
#> 
#>     lift
#> Warning: Setting row names on a tibble is deprecated.

#> Warning: Setting row names on a tibble is deprecated.

#> Warning: Setting row names on a tibble is deprecated.

#> Warning: Setting row names on a tibble is deprecated.

#> Warning: Setting row names on a tibble is deprecated.

#> Warning: Setting row names on a tibble is deprecated.

trained_predictions <- train_list$pred$pred
mean(trained_predictions == d_outcomes)
#> [1] 0.5390625

predict_output <- caret::predict.train(train_list, d_features, type = "prob")
predict_predictions <- predict_output$Y
outcome <- ifelse(predict_predictions > .45, "Y", "N")
mean(d_outcomes == outcome)
#> [1] 1

Created on 2018-09-27 by the reprex package (v0.2.0).

Session Info:


sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#> 
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
#>  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.2.0      Rcpp_0.12.18   
#>  [9] stringi_1.2.4   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
#> [13] digest_0.6.16   evaluate_0.11

Created on 2018-09-27 by the reprex package (v0.2.0).

topepo / caret