time.budget does not work in initial design

PhilippPro commented 6 years ago

I made a benchmark comparing it with ranger and tuneRanger in the default mode on some regression datasets (surprisingly tuneRanger was quite good, maybe datasets too small?) and noticed, that the runtime restriction does not work in the initial design. Any ideas for this? In tuneRanger I have the same problem...

PS: Should I run autoxgboost with other parameters than the default, to get better results? I just used the latest github version...

ja-thomas commented 6 years ago

Hi,

1) I don't really have a solution for this. It is rather a problem of mlrMBO than autoxgboost (hence, the same problem is happening in tuneRanger). The only solution I can see is to calculate the initial design explicitly and check time limits after every evaluation.

BUT: If the runtime is not sufficient to even evaluate the initial design, then the time limit is waaaay to small anyways and the performance will be (most likely) garbage since it's only doing a few iterations of random search.

2) I would guess the bad performance of autoxgboost might be because of too small runtime. But that's hard to say without looking at data. Feel free to send me the benchmark results if you want, then I can maybe say some more about that.

3) In general the parameter set in autoxgboost should be good, I have no guarantee that is't optimal (that is hard to show).

PhilippPro commented 6 years ago

1.: I was thinking the same tonight.

2.: Runtime was not really an issue, as the datasets are not very big, for most of them the runtime was around 500 seconds for autoxgboost. I will have to rerun the benchmark, as I forgot to set seeds etc., I will send it when I have redone it.

3.: I quickly looked over the parameters. How did you set them? Probably you know better than me about them, but are you sure about the eta parameter? In the tunability paper the quantiles of the best eta parameters were bigger. Maybe colsample_bylevel could be set a bit smaller/broader, but this is not so important probably...

ja-thomas commented 6 years ago

iterations is probably a better indicator than runtime :) Cool looking forward to the results
there is def. possible improvement. For the eta I think it's (common) practice to keep it rather low since more iterations should compensate for this and it should be slightly better than a large learning rate with few iterations (I need to look up the reference, I think it's one of the "classic" gbt papers). I should definitely check your results and see what can be improved. My parameters are mostly educated guesses.

PhilippPro commented 5 years ago

Hi Janek,

I have now some reproducible results.

I chose the datasets for the benchmark from OpenML and Kaggle together with a bachelor student (will probably publish this). Most of them are small or medium sized, up to 20000 observations and 200 variables.
The compared algorithms are: ranger, tuneRanger, autoxgboost and liquidSVM (all packages that do tuning for a certain algorithm in their default mode, maybe you know other?)
5-fold-CV (for now)

The code can be found here: https://github.com/PhilippPro/tuneRanger/blob/master/benchmark/benchmark_regression.R

Results in figures: https://github.com/PhilippPro/tuneRanger/tree/master/benchmark/figure

Conclusions for autoxgboost: In most of the times it can compete with the other two implementations, but it is never on top of both. Moreover in some cases it is much worse. I just took the latest autoxgboost github version. Maybe there are some implementation problems in these cases?

Regarding runtime (see the figure): In all datasets autoxgboost does not reach the time.budget, so that is not the issue. It seems like the runtime is very constant, while for tuneRanger it starts to get bigger... liquidSVM is the clear winner here.

The result dataset is also on github, so you can play with it to find the datasets, etc., by using the above R-Code.

PS: Two datasets are still missing but they need a very long time (even with time.budget = 3600)

ja-thomas commented 5 years ago

Awesome, thanks for that @PhilippPro

It's indeed interesting to see that autoxgboost fails on some datasets, I will investigate this further

ja-thomas / autoxgboost

time.budget does not work in initial design #63