ja-thomas / autoxgboost

autoxgboost - Automatic tuning and fitting of xgboost
Other
120 stars 19 forks source link

can define my own objective function for mbo? #55

Open giuseppec opened 5 years ago

giuseppec commented 5 years ago

As far as I can see, the autoxgboost function internally uses holdout for the objective function within the mbo tuning (it is hard-coded). 1) Wouldn't it be cool if users could also specify their own objective here? For example, I want to use 3-fold CV (or stratified CV) instead of the hard-coded holdout. 2) Currently, mbo seems to use the same test-set in each iteration as the resample instance (e.g. test splits) are computed outside from the objective function. This way I am not able to different test splits in each iteration, right? Isn't mbo somehow starting to overfit for those holdout test splits at some point?

giuseppec commented 5 years ago

I would be happy with replacing this line here https://github.com/ja-thomas/autoxgboost/blob/b64048e603751bcba9b6e212c775baff8ababccb/R/autoxgboost.R#L171 with crossval(lrn, task.train, measure)$aggr.

And yes, I would ignore task.test data here completely (on which the early-stopping is based). But maybe it is better to let the user decide if he really wants to do this or not. Or do you see any other problem here?

ja-thomas commented 5 years ago
  1. The main idea is that no resampling should be necessary and xgboost can utilize the full parallelism of the system. But I see the point that there are cases in which this would be totally useful.

  2. This is usually how it is done, otherwise a lot of noise is added. I experimented on some datasets to see how bad the overfitting is, but I couldn't directly find (or create artificially) any "overtuning" on the holdout data. But in general this is something I'm quite interested in to improve, but I need to find cases where this is actually a problem first