Open PhilippPro opened 6 years ago
Hi,
1) I don't really have a solution for this. It is rather a problem of mlrMBO than autoxgboost (hence, the same problem is happening in tuneRanger). The only solution I can see is to calculate the initial design explicitly and check time limits after every evaluation.
BUT: If the runtime is not sufficient to even evaluate the initial design, then the time limit is waaaay to small anyways and the performance will be (most likely) garbage since it's only doing a few iterations of random search.
2) I would guess the bad performance of autoxgboost might be because of too small runtime. But that's hard to say without looking at data. Feel free to send me the benchmark results if you want, then I can maybe say some more about that.
3) In general the parameter set in autoxgboost should be good, I have no guarantee that is't optimal (that is hard to show).
1.: I was thinking the same tonight.
2.: Runtime was not really an issue, as the datasets are not very big, for most of them the runtime was around 500 seconds for autoxgboost. I will have to rerun the benchmark, as I forgot to set seeds etc., I will send it when I have redone it.
3.: I quickly looked over the parameters. How did you set them? Probably you know better than me about them, but are you sure about the eta parameter? In the tunability paper the quantiles of the best eta parameters were bigger. Maybe colsample_bylevel could be set a bit smaller/broader, but this is not so important probably...
iterations is probably a better indicator than runtime :) Cool looking forward to the results
there is def. possible improvement. For the eta I think it's (common) practice to keep it rather low since more iterations should compensate for this and it should be slightly better than a large learning rate with few iterations (I need to look up the reference, I think it's one of the "classic" gbt papers). I should definitely check your results and see what can be improved. My parameters are mostly educated guesses.
Hi Janek,
I have now some reproducible results.
The code can be found here: https://github.com/PhilippPro/tuneRanger/blob/master/benchmark/benchmark_regression.R
Results in figures: https://github.com/PhilippPro/tuneRanger/tree/master/benchmark/figure
Conclusions for autoxgboost: In most of the times it can compete with the other two implementations, but it is never on top of both. Moreover in some cases it is much worse. I just took the latest autoxgboost github version. Maybe there are some implementation problems in these cases?
Regarding runtime (see the figure): In all datasets autoxgboost does not reach the time.budget, so that is not the issue. It seems like the runtime is very constant, while for tuneRanger it starts to get bigger... liquidSVM is the clear winner here.
The result dataset is also on github, so you can play with it to find the datasets, etc., by using the above R-Code.
PS: Two datasets are still missing but they need a very long time (even with time.budget = 3600)
Awesome, thanks for that @PhilippPro
It's indeed interesting to see that autoxgboost fails on some datasets, I will investigate this further
I made a benchmark comparing it with
ranger
andtuneRanger
in the default mode on some regression datasets (surprisinglytuneRanger
was quite good, maybe datasets too small?) and noticed, that the runtime restriction does not work in the initial design. Any ideas for this? IntuneRanger
I have the same problem...PS: Should I run
autoxgboost
with other parameters than the default, to get better results? I just used the latest github version...