Closed HideakiImamura closed 2 years ago
I'm familiar with the topic and happy to continue developing this.
Hi @xadrianzetx. Thank you for your continuing contributions in Optuna. Sorry, but we aadopt this task as an item for a development sprint to be held today in Japan, so perhaps it will be implemented by a participant interested in this task. If so, would you mind if we use your deep insight into this item to review it?
@shu65 will work this task.
Ah, sorry, didn't notice this task on sprint board. Sure, happy to tag along for review if needed.
I found that some values required for kurobako report
are not output in baysemark.
What is the best way to deal with this problem?
fill dummy values to results of baysemark in kurobako results format. kurobako outputs prot with the results. however, kurobako report outputs dummy results with the results.
output data to need kurobako results format in https://github.com/optuna/optuna/blob/master/benchmarks/bayesmark/optuna_optimizer.py
add new report and plot functions to support kurobako results and baysemark results.
You mean metrics and ranking from this paper? These need to be re-implemented and use data bayesmark
stores in eval
and time
directories (cross validation score or valid set score and execution time) of each run. So I'd say option 3 seems correct.
@xadrianzetx Thanks for the comment. I privately discussed about the options with @shu65, and it might be good to choose the option 2, since we are most interested in the kurobako-based evaluation method and think that option 2, which is sufficient to rewrite the bayesmark benchmark, would be the easiest to implement. The other evaluation method to make the report and the plot is future work as your suggested paper.
Unfortunately, shu65 will not have enough time to implement option 2. If you don't mind, I would like to ask for your help, being familiar with the bayesmark benchmark, to implement option 2 based on shu65's findings.
@HideakiImamura what would be the approach in that case? I wasn't really considering that option as there is not much we could do with OptunaOptimizer
instance while benchmark is running. In any case, I can implement it next week or so once i know what the plan would be.
@xadrianzetx I am really sorry for the delay in replying. I had completely lost track of this issue.
I have reconsidered how to proceed with this task: since the hyperparameter optimization problem we are actually running in bayesmark uses sklearn, why not use it directly to run the optimization based on kurobako? As you pointed out before, we cannot use bayesmark's own performance metrics, but we want to make a fair comparison with other kurobako benchmarks, so a kurobako-like performance evaluation would be sufficient.
Then PTAL at dev repo where I've implemented paper kurobako performance evaluation is based on. This replaces bayesmark's metrics and should provide fair comparison. The report and plots were improved as well, to match kurobako.
Let me know if this would fit the approach and I'll work on upstreaming it.
Amazing. Let me check your solutions.
Thank you for carefully understanding kurobako's reporting logic and creating a highly complete builder. Frankly, I am amazed. Thank you so much. If you don't mind, could you submit the changes to https://github.com/xadrianzetx/dev-optuna-bayesmark as a PR to Optuna?
On the other hand, I am a little concerned about the large amount of code that will be added to the bayesmark benchmark in Optuna. I am concerned about what would happen if we gave kurobako the sklearn hyperparameter optimization problem I mentioned yesterday directly as a problem.
After your PR has been submitted, may I examine the volume of changes and increasing complexity, and consider whether to merge them into the master? I am very sorry for the trouble you are going through.
On the other hand, I am a little concerned about the large amount of code that will be added to the bayesmark benchmark in Optuna.
Yeah, I see your point. We are effectively re-implementing reporting parts of kurobako in Python. However, seems like kurobako is planned to be re-written in Python as part of GSoC 2022 anyway, so maybe parts we are about to introduce could be re-used. I've tried to be reasonable on design. Benchmark could then be refactored to drop bayesmark and just run sklearn problems with kurobako-py.
I am concerned about what would happen if we gave kurobako the sklearn hyperparameter optimization problem I mentioned yesterday directly as a problem.
This is a valid option, however it would require me getting familiar with current implementation of kurobako (or more specifically - how to add new problems to it). I'm happy to do it if needed, but lately I've been a bit constrained on time, so I would not be able to guarantee reasonable time frames.
After your PR has been submitted, may I examine the volume of changes and increasing complexity, and consider whether to merge them into the master?
I think that's the best way to approach it now. Let me polish the implementation a bit and I'll be opening PR in the next few days/weeks (hopefully days).
I am very sorry for the trouble you are going through.
No trouble at all!
I think that's the best way to approach it now. Let me polish the implementation a bit and I'll be opening PR in the next few days/weeks (hopefully days).
Thank you for your positive consideration. I look forward to getting a PR!
However, seems like kurobako is planned to be re-written in Python as part of GSoC 2022 anyway, so maybe parts we are about to introduce could be re-used. I've tried to be reasonable on design. Benchmark could then be refactored to drop bayesmark and just run sklearn problems with kurobako-py.
Yes, we have a plan to re-write the kurobako in Python. You are right. This implementation is very useful for this future project.
Motivation
Part of #3245.
Recently, the algorithm benchmark environment with
bayesmark
has been introduced in #3354. We can easily evaluate the performance of the black-box optimization algorithm on the CI (GitHub Actions) for the hyperparameter optimization task of machine learning models withsklearn
. The benchmark result is generated by the features ofbayesmark
, so it is slightly different from that of the benchmark environment withkurobako
.To compare with the result fairly, it would be great to make the benchmark result of
bayesmark
consistent with that ofkurobako
. We want a help to do this.Description
About
bayesmark
benchmarkThe benchmark result with
bayesmark
consists of plots (.png files) and a report (.md file).The plot looks like the following.![optuna-wine-RF-sumamry](https://user-images.githubusercontent.com/38826298/162708304-47cf711a-8151-4687-abb9-8dc61d7e5865.png)
The report looks like this.
About
kurobako
benchmarkThe benchmark result with
kurobako
consists of plots (.png files) and a report (.md file).The plot looks like the following.![hpo-bench-parkinson-1342958b65feb590e4c0284f73a7373aa1006595f85f3ce9a4645ea1bfa581d6](https://user-images.githubusercontent.com/38826298/162709832-207dba9c-f248-4745-8e31-1d6308f96f63.png)
The report looks like this.
What should we do
bayesmark
benchmark so that we can generate the same plots and report as that ofkurobako
benchmark.bayesmark
benchmark consistent with that ofkurobako
benchmark.bayesmark
benchmark consistent with that ofkurobako
benchmark.Alternatives (optional)
No response
Additional context (optional)
No response