topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

mlpWeightDecayML: problem with random search and adaptive resampling #1354

Open Rek27 opened 9 months ago

Rek27 commented 9 months ago

I am trying to do the hyperparameter tuning by using random search algorithm. When trying out the mlpWeightDecayML model, it seems that there is an underlying problem with implementation how the search algorithm works with this model.

Note: I have noticed this only with this model, any other I tried worked fine. Also, I have been playing with adaptive resampling and there is a same problem.

If I set the seed to 5 and take a look at the output, everything looks normal:

data(iris)
set.seed(5)
caretModel = caret::train(
    x = iris[, -ncol(iris)],
    y = iris[, ncol(iris)],
    method = "mlpWeightDecayML",
    tuneLength = 5,
    trControl = caret::trainControl(method="cv", search="random", number=2, verboseIter = TRUE)
)
# + Fold1: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# - Fold1: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# + Fold1: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05
# - Fold1: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05 
# + Fold1: layer1=10, layer2=15, layer3=10, decay=8.376e-03
# - Fold1: layer1=10, layer2=15, layer3=10, decay=8.376e-03 
# + Fold1: layer1= 8, layer2=12, layer3=19, decay=3.723e-02
# - Fold1: layer1= 8, layer2=12, layer3=19, decay=3.723e-02 
# + Fold1: layer1=20, layer2=16, layer3=17, decay=3.865e-02
# - Fold1: layer1=20, layer2=16, layer3=17, decay=3.865e-02 
# + Fold2: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05
# - Fold2: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# + Fold2: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05
# - Fold2: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05 
# + Fold2: layer1=10, layer2=15, layer3=10, decay=8.376e-03
# - Fold2: layer1=10, layer2=15, layer3=10, decay=8.376e-03 
# + Fold2: layer1= 8, layer2=12, layer3=19, decay=3.723e-02
# - Fold2: layer1= 8, layer2=12, layer3=19, decay=3.723e-02 
# + Fold2: layer1=20, layer2=16, layer3=17, decay=3.865e-02
# - Fold2: layer1=20, layer2=16, layer3=17, decay=3.865e-02 
# Aggregating results
# Selecting tuning parameters
# Fitting layer1 = 12, layer2 = 6, layer3 = 9, decay = 1.22e-05 on full training set

If I try a different seed, for example 6, it doesn't seem to work:

data(iris)
set.seed(6)
caretModel = caret::train(
    x = iris[, -ncol(iris)],
    y = iris[, ncol(iris)],
    method = "mlpWeightDecayML",
    tuneLength = 5,
    trControl = caret::trainControl(method="cv", search="random", number=2, verboseIter = TRUE)
)
# + Fold1: layer1=14, layer2= 2, layer3= 0, decay=3.063e+00 
# + Fold1: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# - Fold1: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# + Fold1: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# - Fold1: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03 
# + Fold1: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# - Fold1: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02 
# + Fold1: layer1= 3, layer2=17, layer3=18, decay=5.715e-05 
# - Fold1: layer1= 3, layer2=17, layer3=18, decay=5.715e-05 
# + Fold2: layer1=14, layer2= 2, layer3= 0, decay=3.063e+00
# + Fold2: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# - Fold2: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# + Fold2: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# - Fold2: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# + Fold2: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# - Fold2: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# + Fold2: layer1= 3, layer2=17, layer3=18, decay=5.715e-05
# - Fold2: layer1= 3, layer2=17, layer3=18, decay=5.715e-05
# Error in { :
#   task 1 failed - "arguments imply differing number of rows: 0, 75"
# In addition: Warning messages:
# 1: At least one layer had zero units and were removed. The new structure is 14->2   
# 2: At least one layer had zero units and were removed. The new structure is 14->2 

After inspecting the output, I have noticed something strange. In the first example, the tuning iterations look normal, one starts and then finishes ('+' and then '-' at the beginning of the line). But if you take a look at the second example, it is not chronological. First two lines of the output are the beginnings of iterations (first one did not finish) and then at the beginning of Fold2, the same thing happened. I am not sure what is the underlining problem. I have tried changing datasets, different trainControl parameters but there is a pattern that, when this error happens, the iteration output is a bit messy.

Session Info:

>sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] caret_6.0-94    lattice_0.20-45 ggplot2_3.4.4   reprex_2.1.0

loaded via a namespace (and not attached):
 [1] httr_1.4.7           jsonlite_1.8.8       splines_4.2.2
 [4] foreach_1.5.2        R.utils_2.12.3       prodlim_2023.08.28
 [7] stats4_4.2.2         yaml_2.3.8           globals_0.16.2      
[10] ipred_0.9-14         RSNNS_0.4-17         pillar_1.9.0
[13] glue_1.6.2           pROC_1.18.5          digest_0.6.33
[16] hardhat_1.3.0        colorspace_2.1-0     recipes_1.0.9
[19] htmltools_0.5.7      Matrix_1.5-1         R.oo_1.25.0
[22] plyr_1.8.9           timeDate_4032.109    clipr_0.8.0
[25] pkgconfig_2.0.3      listenv_0.9.0        purrr_1.0.2
[28] scales_1.3.0         processx_3.8.2       gower_1.0.1
[31] lava_1.7.3           proxy_0.4-27         timechange_0.2.0    
[34] tibble_3.2.1         styler_1.10.2        generics_0.1.3
[37] withr_2.5.2          nnet_7.3-18          cli_3.6.1
[40] survival_3.4-0       magrittr_2.0.3       evaluate_0.23
[43] ps_1.7.5             R.methodsS3_1.8.2    fs_1.6.3
[46] fansi_1.0.5          future_1.33.1        parallelly_1.36.0
[49] R.cache_0.16.0       nlme_3.1-160         MASS_7.3-58.1
[52] class_7.3-20         tools_4.2.2          data.table_1.14.8   
[55] lifecycle_1.0.4      stringr_1.5.1        munsell_0.5.0
[58] callr_3.7.3          e1071_1.7-13         compiler_4.2.2
[61] rlang_1.1.2          grid_4.2.2           iterators_1.0.14
[64] rstudioapi_0.15.0    rmarkdown_2.25       gtable_0.3.4
[67] ModelMetrics_1.2.2.2 codetools_0.2-18     reshape2_1.4.4
[70] R6_2.5.1             lubridate_1.9.3      knitr_1.45
[73] dplyr_1.1.4          fastmap_1.1.1        future.apply_1.11.1
[76] utf8_1.2.4           stringi_1.8.2        parallel_4.2.2
[79] Rcpp_1.0.11          vctrs_0.6.5          rpart_4.1.19
[82] tidyselect_1.2.0     xfun_0.41
>
Rek27 commented 9 months ago

In addition, it seems that the problem happens if decay is very big. It could be the cause of the issue