topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 633 forks source link

xgboost model warning : `ntree_limit` is deprecated, use `iteration_range` instead #1270

Open bappa10085 opened 2 years ago

bappa10085 commented 2 years ago

Running xgboost model using caret package gives following warning

WARNING: amalgamation/../src/c_api/c_api.cc:718: ntree_limit is deprecated, use iteration_range instead.

Minimal, reproducible example:

library(caret)

#eXtreme Gradient Boosting
set.seed(123)
modelFit <- train(Species~., data=iris, 
                preProcess=c("center", "scale"), 
                method="xgbTree")

I have tried to use warning = FALSE and message = FALSE in the chunk setting. But still, it appears in the knit document. How to remove this warning?

Session Info:

>sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
[5] LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-90    lattice_0.20-45 ggplot2_3.3.5  

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1     purrr_0.3.4          reshape2_1.4.4      
 [4] listenv_0.8.0        splines_4.1.2        colorspace_2.0-2    
 [7] vctrs_0.3.8          generics_0.1.1       stats4_4.1.2        
[10] utf8_1.2.2           survival_3.2-13      prodlim_2019.11.13  
[13] rlang_0.4.11         e1071_1.7-9          ModelMetrics_1.2.2.2
[16] pillar_1.6.4         glue_1.6.0           withr_2.4.3         
[19] DBI_1.1.2            xgboost_1.5.0.2      foreach_1.5.1       
[22] lifecycle_1.0.1      plyr_1.8.6           lava_1.6.10         
[25] stringr_1.4.0        timeDate_3043.102    munsell_0.5.0       
[28] gtable_0.3.0         future_1.23.0        recipes_0.1.17      
[31] codetools_0.2-18     parallel_4.1.2       class_7.3-19        
[34] fansi_0.5.0          Rcpp_1.0.7           scales_1.1.1        
[37] ipred_0.9-12         jsonlite_1.7.2       parallelly_1.30.0   
[40] digest_0.6.29        stringi_1.7.5        dplyr_1.0.7         
[43] grid_4.1.2           tools_4.1.2          magrittr_2.0.1      
[46] proxy_0.4-26         tibble_3.1.6         crayon_1.4.2        
[49] future.apply_1.8.1   pkgconfig_2.0.3      ellipsis_0.3.2      
[52] MASS_7.3-54          Matrix_1.3-4         data.table_1.14.2   
[55] pROC_1.18.0          lubridate_1.8.0      gower_0.2.2         
[58] assertthat_0.2.1     iterators_1.0.13     R6_2.5.1            
[61] globals_0.14.0       rpart_4.1-15         nnet_7.3-16         
[64] nlme_3.1-153         compiler_4.1.2
Jack-make commented 2 years ago

Hello, I also encountered this problem. How did you solve it?

bappa10085 commented 2 years ago

You can follow this. Just add verbosity = 0 within train function.

bappa10085 commented 2 years ago

As suggested by missuse "The current warning means xgboost is changing the name of an argument, but caret is still supplying the old name. Currently it works but with new xgboost versions the argument will be completely replaced, if carets function code is not updated by then the warning will be replaced by an error." So, it would be better if carets function code is updated.

ifellows commented 1 year ago

Agreed. I am teaching a class using caret and I think these warnings are confusing for students.

Jon77Ruler commented 6 months ago

@topepo Any plans to change this please? Seems a simple one-liner?

serkor1 commented 2 months ago

Source of warning

The warning comes when using the predict()-function with the ntreelimit-parameter. See code chunk below,

https://github.com/topepo/caret/blob/5f4bd2069bf486ae92240979f9d65b5c138ca8d4/models/files/xgbDART.R#L164

So yes, @Jon77Ruler, this is an easy fix if ntreelimit is changed with iteration_range. I have posted a reprex below to demonstrate the issue using {xgboost}.

However, I am not sure what the repository rules are for these kind of "simple" bug-fixes. @topepo wrote that {caret} is on the "backburner", see issue https://github.com/topepo/caret/issues/1365 - so it might be a while before we get a fix. Its not breaking "issue" yet, but it might be in the future.

Demonstration of problem and solution

library(xgboost)
data(
  agaricus.train, 
  package = 'xgboost'
)

data(
  agaricus.test, 
  package = 'xgboost'
)

# + estimate model
simple_model <- xgboost(
  data =agaricus.train$data,
  label = agaricus.train$label,nrounds = 2
)
#> [1]  train-rmse:0.350593 
#> [2]  train-rmse:0.246082

# + in caret
first <- predict(
  simple_model,
  agaricus.test$data,
  # in caret
  ntreelimit = 2
)
#> [10:44:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
second <- predict(
  simple_model, 
  agaricus.test$data, 
  # in xgboost
  iteration_range = 2
)
setequal(
  first,
  second
)
#> [1] TRUE

Created on 2024-07-19 with reprex v2.1.0