mlr-org / mlr3tuning

Hyperparameter optimization package of the mlr3 ecosystem
https://mlr3tuning.mlr-org.com/
GNU Lesser General Public License v3.0
54 stars 5 forks source link

add irace #227

Closed RaphaelS1 closed 3 years ago

RaphaelS1 commented 4 years ago

Also changed the template very slightly to include all three constructor methods. If you dislike this let me know and I'll revert back.

berndbischl commented 4 years ago

some small comments for now:

berndbischl commented 4 years ago
codecov-io commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (main@83d8022). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #227   +/-   ##
=======================================
  Coverage        ?   92.30%           
=======================================
  Files           ?       20           
  Lines           ?      390           
  Branches        ?        0           
=======================================
  Hits            ?      360           
  Misses          ?       30           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 83d8022...45ef598. Read the comment docs.

RaphaelS1 commented 4 years ago

i dislike that you are tagging every irace-control param with "irace". just to be able to add the meta-param "show.irace.output". can we somehow get around this? your coding style is in many place not mlr3-complaint (arrow, spaces, camel-case, etc). pls fix now you need to set properties = "dependencies" to tell the tuner that it can handle dependent params

Done, done, done.

please unit-test paradox_to_irace separately mlr2 had a lot of unit tests for irace. copy most of them over if possible please

Done - Need to check how dependencies are working though. I'm unsure if some tests working as expected. Also added a few dependencies to suggests that could be removed if needed by changing tested learners

RaphaelS1 commented 4 years ago

Shall we try to finish, merge and close this PR? @mb706 @berndbischl . Open questions seem to be about correctly setting instances to match mlr3tuning to irace design

jakob-r commented 4 years ago

We should first merge #240 and then I will probably have to adapt this PR to the API changes.

RaphaelS1 commented 4 years ago

We should first merge #240 and then I will probably have to adapt this PR to the API changes.

okay great, I've converted to a draft for now

RaphaelS1 commented 3 years ago

@jakob-r after the merge from #240 (and probably many others since then), what needs to change in this PR?

jakob-r commented 3 years ago

Have you merged the master into your branch, check that tests are complete and running, mark this PR as ready for review and we will merge it if everyhting is fine.

RaphaelS1 commented 3 years ago

Okay so pinging everyone who replied above many months ago: @jakob-r @be-marc @mb706

I've merged in Master and got it working, however there seems to be a problem upstream causing warnings ('partial match') you can see it on all the builds. For most of the tests I can handle this with suppressWarnings but I can't do this on the primary autotest. The hacky fix would be to catch irace in the autotest and then wrap the public optimize method in suppressWarningsbut that's up to you. I'm 99% sure the problem isn't our side because we don't use length as an argument in a package (grep), but let me know if wrong.

About the tuner itself, some things I'd appreciate double checking before merging, sorry I don't know the internals of mlr3tuning or irace well so a lot of my implementation is unfortunately guesswork:

  1. Is the use of instance correct and concordant with the use in the irace packages? See conversations about this
  2. Currently it's only set-up to work with TerminatorEvals and TerminatorRunTime as the parameters of these are passed to the maxExperiments and maxTime arguments in the irace scenario (see make_scenario in irace_helpers.R), i) do you agree with this?; ii) what does this mean for other terminators? i.e. will they just not work here or do we have to add them in some other way?
  3. Can someone confirm that they agree that minimizing time will work as expected? See lines 43-52 in test_TunerIrace.R for an example
  4. Follow-up from the conversation I had with @mb706 above regarding assign_result, does this still need updating or will the result automatically returned be the 'correct one'. Sorry I'm not very clear here but I'm also not sure at all about the internal of mlr3tuning so it's hard for me to know how the 'best' configuration is returned at the end of tuning (same with irace not sure how this works internally).

Thanks for the help with this

EDIT: For some reason the warnings didn't show up in the tests this time, so it's possibly stochastic? Which seems weird given the nature of the warnings in the previous tests, either way looks good now.

mllg commented 3 years ago

I've merged in Master and got it working, however there seems to be a problem upstream causing warnings ('partial match') you can see it on all the builds. For most of the tests I can handle this with suppressWarnings but I can't do this on the primary autotest. The hacky fix would be to catch irace in the autotest and then wrap the public optimize method in suppressWarningsbut that's up to you. I'm 99% sure the problem isn't our side because we don't use length as an argument in a package (grep), but let me know if wrong.

These checks are enabled by our setup.R, i.e. https://github.com/mlr-org/mlr3tuning/blob/master/tests/testthat/setup.R#L4.

You can temporarily disable them if required.

be-marc commented 3 years ago

@RaphaelS1 Hope you do not mind that I changed your code. irace does not fit into our bbotk / mlr3tuning framework like the other tuners.

  1. Is the use of instance correct and concordant with the use in the irace packages? See conversations about this

No, instances in irace are resampling instances. We pass instantiated resampling instances in scenario$instances now. The user sets the number of instances via the n_instances parameter and internally the resampling is instantiated n_instances times. The TuningInstanceSingleCrit is now passed via scenario$targetRunnerData.

  1. Currently it's only set-up to work with TerminatorEvals and TerminatorRunTime as the parameters of these are passed to the maxExperiments and maxTime arguments in the irace scenario (see make_scenario in irace_helpers.R), i) do you agree with this?; ii) what does this mean for other terminators? i.e. will they just not work here or do we have to add them in some other way?

If the best configuration in irace is not based on the best performance in one resampling (Question 4), we should not terminate irace with our terminators. We should irace terminate on its own so it can return the best configuration.

@berndbischl You already mentioned in #169 that we might have a design issue. TunerIrace needs to change the train / test splits of ObjectiveTuning$resampling.

I solved this by making ObjectiveTuning$resampling an active binding.

tuning_instance$objective$resampling = experiment$instance
cost = tuning_instance$eval_batch(config)

So TunerIrace changes the train / test split before executing $eval_batch().

Another option would be that $eval_batch(config, resampling) can also pass a resampling instance. We have to change the mlr3tuning code at multiple locations for this and copy/ modify methods that live in bbotk.

RaphaelS1 commented 3 years ago

@RaphaelS1 Hope you do not mind that I changed your code. irace does not fit into our bbotk / mlr3tuning framework like the other tuners.

Please change anything you want, I really don't know the internals of mlr3tuning too well and I know nothing about bbotk. This initial PR was more of a starting point that I hope provides some help!

No, instances in irace are resampling instances. We pass instantiated resampling instances in scenario$instances now.

Yes I think this changed considerably after I opened the initial PR

be-marc commented 3 years ago

This initial PR was more of a starting point that I hope provides some help!

Yes your work helps a lot!

Follow-up from the conversation I had with @mb706 above regarding assign_result, does this still need updating or will the result automatically returned be the 'correct one'. Sorry I'm not very clear here but I'm also not sure at all about the internal of mlr3tuning so it's hard for me to know how the 'best' configuration is returned at the end of tuning (same with irace not sure how this works internally).

irace::irace returns the elites of the final race as the result. We select the best performing one in .assign_result. The reported performance is the average performance of all resampling iterations. The log of irace::irace is returning the same number.

be-marc commented 3 years ago

We have to remove show.irace.output. It also filters our logging messages. We do not have a solution for this. regr.km is also spaming the console.

be-marc commented 3 years ago

Can someone confirm that they agree that minimizing time will work as expected? See lines 43-52 in test_TunerIrace.R for an example

The unit tests that uses TerminatorRunTime fail sometimes i.e. irace::irace does return a best configuration.

ps = ParamSet$new(params = list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamDbl$new("minsplit", lower = 1, upper = 10)
))
ps$trafo = function(x, param_set) {
  x$minsplit = as.integer(round(x$minsplit))
  return(x)
}
inst = TuningInstanceSingleCrit$new(
  tsk("iris"), lrn("classif.rpart"), rsmp("holdout"),
  msr("classif.ce"), trm("run_time", secs = 24), ps)
tt = tnr("irace")
tt$optimize(inst)
expect_double(inst$archive$best()$cp)
ps = ParamSet$new(params = list(
  ParamDbl$new("cp", lower = 0.001, upper = 0.1)
))
inst = TuningInstanceSingleCrit$new(
  tsk("iris"), lrn("classif.rpart"), rsmp("holdout"),
  msr("classif.ce"), trm("run_time", secs = 20), ps)
tt = tnr("irace", capping = 1, boundMax = 1, cappingType = "best", boundType = "instance")
tt$optimize(inst)
expect_double(inst$archive$best()$cp)
be-marc commented 3 years ago

I added a new unit test with TerminatorRunTime that seems to run stable. The PR is ready for a review. The unit test with TerminatorRunTime is still not stable. It rarely fails and is difficult to reproduce. If irace::irace does not return a result, we throw the error message irace::irace did not return a result. The evaluated configurations are still accessible through the archive. The unit test wraps tuner$optimize(instance) in a try statement to avoid failing tests. We just check if the archive has evaluated configurations. Does anyone have a better idea?

jakob-r commented 3 years ago

~I added a new unit test with TerminatorRunTime that seems to run stable.~ The PR is ready for a review. The unit test with TerminatorRunTime is still not stable. It rarely fails and is difficult to reproduce. If irace::irace does not return a result, we throw the error message irace::irace did not return a result. The evaluated configurations are still accessible through the archive. The unit test wraps tuner$optimize(instance) in a try statement to avoid failing tests. We just check if the archive has evaluated configurations. Does anyone have a better idea?

Does it fail bc. irace is taking a tiny bit longer than the terminator is allowing?

be-marc commented 3 years ago

Our documentation stated

Uninstantiated resamplings are instantiated during construction so that all configurations are evaluated on the same data splits.

I added

If a new resampling is passed, it is instantiated with new data splits. Already instantiated resamplings are kept unchanged.

be-marc commented 3 years ago

Does it fail bc. irace is taking a tiny bit longer than the terminator is allowing?

It does not fail often. We substract 5 seconds now. Maybe it helps.

be-marc commented 3 years ago

Closes #63

jakob-r commented 3 years ago

I think this can be merged. Any objections?