mlr-org / bbotk

Black-box optimization framework for R.
https://bbotk.mlr-org.com
GNU Lesser General Public License v3.0
21 stars 9 forks source link

There should be a way to find out how long a Terminator will be running at most #112

Closed mb706 closed 3 years ago

mb706 commented 4 years ago

Active binding:

Terminator$max_time = function() Inf

TerminatorRunTime$max_time = function() self$param_set$values$secs

TerminatorClockTime$max_time = function() as.numeric(difftime(self$param_set$values$stop_time, Sys.time()))

TerminatorCombo$max_time = function() min(map_dbl(self$terminators, "max_time"))

This would make it possible to have tuners automatically set learner timeouts according to the time remaining.

berndbischl commented 4 years ago

we had this discussion already. you are aware that many terminators don't support this? and you have to be carefully if you built upon this feature?

what do you suggest in a case like "stagnation"? simply maxtime=inf? or a property that says the term cant really talk about its maxtime?

mb706 commented 4 years ago

TerminatorStagnation$max_time would be Inf, since it inherits from Terminator and doesn't overload it. The interpretation of max_time would be "what is the maximum amount of time that tuning with this Terminator will (or should) take".

berndbischl commented 4 years ago

@jakob-r wanted this before, i "blocked" this, because we never really robustly defined how it works. what you currently propose seems fine to me. but: this implies that any functionality we build on this, needs to handle the case that max_time can be Inf. but that's doable.

berndbischl commented 4 years ago

that would then also enable the most important feature in tuning of all time: progressbars....

mb706 commented 4 years ago

progress bars are probably a bit harder because one would also want to get info about progress w/r/t number of evals. Maybe something like

Terminator$progress = function(archive) 0

TerminatorRunTime$progress = function(archive) {
  as.numeric(difftime(Sys.time(), archive$start_time), units = "secs") / self$param_set$values$secs
}

TerminatorEvals$progress = function(archive) archive$n_evals / self$param_set$values$n_evals

TerminatorCombo$progress = function(archive) max(map_dbl(self$terminators, function(x) x$progress(archive)))

etc.

jakob-r commented 4 years ago

I would still like to see this. However, $max_time() is kind of a new Idea. To make sure that we build currectly on it I see two options: 1) Inheritance: Terminator <- TerminatorProgress <- TerminatorProgressTime so we can assert on the terminator class. 2) Property Tags so we cann write assertTerminator(term, "max_time")

Apart from that I would suggest that $progress or $max_time returns a not implemented error.

mb706 commented 4 years ago

IMO max_time should return Inf when nothing else is given, as in "there is no upper bound that can be set on the runtime".

With progress I would return 0 by default. Since progress is more a UI thing than anything else, I think it is better here to return just some value instead of erroring.

mb706 commented 4 years ago

To elaborate on why I think max_time with Inf if not specified is the correct thing to do:

I want max_time to be used in combination with https://github.com/mlr-org/mlr3/issues/576 to abort tuning batches early. I expect it is a common usecase that someone does tuning with a hard time-limit, so a tuning run should return before a cluster-job's allocated time runs out and everything is killed without saving. However, some tuners, like hyperband, may have quite large tuning batches so setting a learner timeout alone doesn't help, and a tuning-batch-timeout would need to be set. If the terminator contains the information that "the user wants this to be done in at most X seconds" then the tuner could set the tuning batch timeout automatically from this information. If multiple terminators are present in a TerminatorCombo, then the minimum of all available terminator max_times can be given; if no terminator is time-based, then this tells us the user doesn't care about the runtime and the maximum available time is infinite.

berndbischl commented 4 years ago

To elaborate on why I think max_time with Inf if not specified is the correct thing to do:

i think i agree

be-marc commented 3 years ago

Closed by #63. Terminator$remaining_time returns remaining runtime.