Closed RaphaelS1 closed 3 years ago
some small comments for now:
:exclamation: No coverage uploaded for pull request base (
main@83d8022
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## main #227 +/- ##
=======================================
Coverage ? 92.30%
=======================================
Files ? 20
Lines ? 390
Branches ? 0
=======================================
Hits ? 360
Misses ? 30
Partials ? 0
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 83d8022...45ef598. Read the comment docs.
i dislike that you are tagging every irace-control param with "irace". just to be able to add the meta-param "show.irace.output". can we somehow get around this? your coding style is in many place not mlr3-complaint (arrow, spaces, camel-case, etc). pls fix now you need to set properties = "dependencies" to tell the tuner that it can handle dependent params
Done, done, done.
please unit-test paradox_to_irace separately mlr2 had a lot of unit tests for irace. copy most of them over if possible please
Done - Need to check how dependencies are working though. I'm unsure if some tests working as expected. Also added a few dependencies to suggests that could be removed if needed by changing tested learners
Shall we try to finish, merge and close this PR? @mb706 @berndbischl . Open questions seem to be about correctly setting instances
to match mlr3tuning to irace design
We should first merge #240 and then I will probably have to adapt this PR to the API changes.
We should first merge #240 and then I will probably have to adapt this PR to the API changes.
okay great, I've converted to a draft for now
@jakob-r after the merge from #240 (and probably many others since then), what needs to change in this PR?
Have you merged the master into your branch, check that tests are complete and running, mark this PR as ready for review and we will merge it if everyhting is fine.
Okay so pinging everyone who replied above many months ago: @jakob-r @be-marc @mb706
I've merged in Master and got it working, however there seems to be a problem upstream causing warnings ('partial match') you can see it on all the builds. For most of the tests I can handle this with suppressWarnings
but I can't do this on the primary autotest. The hacky fix would be to catch irace in the autotest and then wrap the public optimize
method in suppressWarnings
but that's up to you. I'm 99% sure the problem isn't our side because we don't use length
as an argument in a package (grep), but let me know if wrong.
About the tuner itself, some things I'd appreciate double checking before merging, sorry I don't know the internals of mlr3tuning or irace well so a lot of my implementation is unfortunately guesswork:
TerminatorEvals
and TerminatorRunTime
as the parameters of these are passed to the maxExperiments
and maxTime
arguments in the irace scenario (see make_scenario
in irace_helpers.R
), i) do you agree with this?; ii) what does this mean for other terminators? i.e. will they just not work here or do we have to add them in some other way?test_TunerIrace.R
for an exampleassign_result
, does this still need updating or will the result automatically returned be the 'correct one'. Sorry I'm not very clear here but I'm also not sure at all about the internal of mlr3tuning so it's hard for me to know how the 'best' configuration is returned at the end of tuning (same with irace not sure how this works internally).Thanks for the help with this
EDIT: For some reason the warnings didn't show up in the tests this time, so it's possibly stochastic? Which seems weird given the nature of the warnings in the previous tests, either way looks good now.
I've merged in Master and got it working, however there seems to be a problem upstream causing warnings ('partial match') you can see it on all the builds. For most of the tests I can handle this with
suppressWarnings
but I can't do this on the primary autotest. The hacky fix would be to catch irace in the autotest and then wrap the publicoptimize
method insuppressWarnings
but that's up to you. I'm 99% sure the problem isn't our side because we don't uselength
as an argument in a package (grep), but let me know if wrong.
These checks are enabled by our setup.R
, i.e. https://github.com/mlr-org/mlr3tuning/blob/master/tests/testthat/setup.R#L4.
You can temporarily disable them if required.
@RaphaelS1 Hope you do not mind that I changed your code. irace
does not fit into our bbotk
/ mlr3tuning
framework like the other tuners.
- Is the use of instance correct and concordant with the use in the irace packages? See conversations about this
No, instances in irace
are resampling instances. We pass instantiated resampling instances in scenario$instances
now. The user sets the number of instances via the n_instances
parameter and internally the resampling is instantiated n_instances
times. The TuningInstanceSingleCrit
is now passed via scenario$targetRunnerData
.
- Currently it's only set-up to work with TerminatorEvals and TerminatorRunTime as the parameters of these are passed to the maxExperiments and maxTime arguments in the irace scenario (see make_scenario in irace_helpers.R), i) do you agree with this?; ii) what does this mean for other terminators? i.e. will they just not work here or do we have to add them in some other way?
If the best configuration in irace
is not based on the best performance in one resampling (Question 4), we should not terminate irace
with our terminators. We should irace
terminate on its own so it can return the best configuration.
@berndbischl You already mentioned in #169 that we might have a design issue. TunerIrace
needs to change the train / test splits of ObjectiveTuning$resampling
.
I solved this by making ObjectiveTuning$resampling
an active binding.
tuning_instance$objective$resampling = experiment$instance
cost = tuning_instance$eval_batch(config)
So TunerIrace
changes the train / test split before executing $eval_batch()
.
Another option would be that $eval_batch(config, resampling)
can also pass a resampling instance. We have to change the mlr3tuning code at multiple locations for this and copy/ modify methods that live in bbotk.
@RaphaelS1 Hope you do not mind that I changed your code. irace does not fit into our bbotk / mlr3tuning framework like the other tuners.
Please change anything you want, I really don't know the internals of mlr3tuning too well and I know nothing about bbotk. This initial PR was more of a starting point that I hope provides some help!
No, instances in irace are resampling instances. We pass instantiated resampling instances in scenario$instances now.
Yes I think this changed considerably after I opened the initial PR
This initial PR was more of a starting point that I hope provides some help!
Yes your work helps a lot!
Follow-up from the conversation I had with @mb706 above regarding assign_result, does this still need updating or will the result automatically returned be the 'correct one'. Sorry I'm not very clear here but I'm also not sure at all about the internal of mlr3tuning so it's hard for me to know how the 'best' configuration is returned at the end of tuning (same with irace not sure how this works internally).
irace::irace
returns the elites of the final race as the result. We select the best performing one in .assign_result
. The reported performance is the average performance of all resampling iterations. The log of irace::irace
is returning the same number.
We have to remove show.irace.output
. It also filters our logging messages. We do not have a solution for this. regr.km
is also spaming the console.
Can someone confirm that they agree that minimizing time will work as expected? See lines 43-52 in test_TunerIrace.R for an example
The unit tests that uses TerminatorRunTime
fail sometimes i.e. irace::irace
does return a best configuration.
ps = ParamSet$new(params = list(
ParamDbl$new("cp", lower = 0.001, upper = 0.1),
ParamDbl$new("minsplit", lower = 1, upper = 10)
))
ps$trafo = function(x, param_set) {
x$minsplit = as.integer(round(x$minsplit))
return(x)
}
inst = TuningInstanceSingleCrit$new(
tsk("iris"), lrn("classif.rpart"), rsmp("holdout"),
msr("classif.ce"), trm("run_time", secs = 24), ps)
tt = tnr("irace")
tt$optimize(inst)
expect_double(inst$archive$best()$cp)
ps = ParamSet$new(params = list(
ParamDbl$new("cp", lower = 0.001, upper = 0.1)
))
inst = TuningInstanceSingleCrit$new(
tsk("iris"), lrn("classif.rpart"), rsmp("holdout"),
msr("classif.ce"), trm("run_time", secs = 20), ps)
tt = tnr("irace", capping = 1, boundMax = 1, cappingType = "best", boundType = "instance")
tt$optimize(inst)
expect_double(inst$archive$best()$cp)
I added a new unit test with The PR is ready for a review.
The unit test with TerminatorRunTime
that seems to run stable.TerminatorRunTime
is still not stable. It rarely fails and is difficult to reproduce. If irace::irace
does not return a result, we throw the error message irace::irace did not return a result. The evaluated configurations are still accessible through the archive.
The unit test wraps tuner$optimize(instance)
in a try statement to avoid failing tests. We just check if the archive has evaluated configurations. Does anyone have a better idea?
~I added a new unit test with
TerminatorRunTime
that seems to run stable.~ The PR is ready for a review. The unit test withTerminatorRunTime
is still not stable. It rarely fails and is difficult to reproduce. Ifirace::irace
does not return a result, we throw the error messageirace::irace did not return a result. The evaluated configurations are still accessible through the archive.
The unit test wrapstuner$optimize(instance)
in a try statement to avoid failing tests. We just check if the archive has evaluated configurations. Does anyone have a better idea?
Does it fail bc. irace is taking a tiny bit longer than the terminator is allowing?
Our documentation stated
Uninstantiated resamplings are instantiated during construction so that all configurations are evaluated on the same data splits.
I added
If a new resampling is passed, it is instantiated with new data splits. Already instantiated resamplings are kept unchanged.
Does it fail bc. irace is taking a tiny bit longer than the terminator is allowing?
It does not fail often. We substract 5 seconds now. Maybe it helps.
Closes #63
I think this can be merged. Any objections?
irace_helpers.R
for helper functions to makeTunerIrace
a bit neaterTunerIrace
Also changed the template very slightly to include all three constructor methods. If you dislike this let me know and I'll revert back.