mlpack / ensmallen

A header-only C++ library for numerical optimization --
http://ensmallen.org
Other
742 stars 120 forks source link

Custom early stop callback #377

Closed Kaltxi closed 11 months ago

Kaltxi commented 1 year ago

Issue description

I am trying to make a custom callback to be able to stop optimization asynchronously. Documentation says that the callback should return true to end the process, so I made the following callback:

class OptimizationEarlyStop {
public:
  template <typename OptimizerType, typename FunctionType, typename MatType>
  auto Evaluate(OptimizerType& /* optimizer */,
                FunctionType& /* function */,
                const MatType& /* coordinates */,
                const double /* objective */) -> bool {
    return shouldStop_;
  }

  void stop() {
    const std::lock_guard lock(stopMutex_);
    shouldStop_ = true;
  }

private:
  bool shouldStop_{false};
  std::mutex stopMutex_;
};

That said the callback doesn't seem to run. If I make the Evaluate return void the callback runs, but I can't return bool in this case.

Your environment

Steps to reproduce

Defining the Evaluate() callback with bool return type makes callback unable to be run.

Expected behavior

I expect to be able to define bool Evaluate() and be able to stop optimization with it.

rcurtin commented 1 year ago

Hey there @Kaltxi, can you also provide the code that you are using to run the optimization? If it's a complete example that we can compile and run, it's way easier to debug. Thanks!

Kaltxi commented 1 year ago

@rcurtin Hello! Here is a minimal example that minimizes a trivial function with a slow down in its Evaluate(). Optimization is launched via std::async and while it is running the callback is called. ensmallen-early-return.zip

zoq commented 1 year ago

It looks like you never pass the callback to the Optimize function:

const double minimum = optimizer.Optimize(f, coefs, ens::Report{});

should be something like:

OptimizationEarlyStop early_stop;
const double minimum = optimizer.Optimize(f, coefs, ens::Report{}, early_stop);
Kaltxi commented 1 year ago

@zoq Yeah, thats a typo, fixing it yields the same behavior. In any case the bool version of OptimizationEarlyStop::Evaluate is never called, when it's supposed to be called on every function evaluation.

mlpack-bot[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:

zoq commented 1 year ago

let's keep this open

SuvarshaChennareddy commented 1 year ago

Hello @Kaltxi, I believe this is expected behavior as the Evaluate() callback method does not affect the optimization loop (at least with the DE optimizer). One dirty trick that I thought of (haven't tested it) that you could try incorporating is setting the value of tolerance to infinity (or a very large number) via Tolerance() . The optimization would immediately end once it breaks out of the population loop (in the DE optimizer). If you want to break out immediately, then try setting the populationSize attribute to 0 via the PopulationSize() method. You could then reset the attribute in the EndOptimization() callback method.

Please do correct me if I overlooked something.

rcurtin commented 1 year ago

I spent a little time digging into the situation and I think there is some confusion here. In fact, one example demonstrates the whole situation pretty clearly: https://www.ensmallen.org/docs.html#early-stopping-at-minimum-loss

Here, we have a custom callback example that returns true in EndEpoch() if the optimization should terminate early, but the signature of the function is void.

Throughout the codebase, sometimes we call callbacks and check their return value and terminate if it is true. Other times (DE is one of these cases), we don't. With a quick, trivial, and somewhat wrong grep I see 103 uses where we don't use the return value and 83 where we do.

I believe our original intention was to always check the callback result; see Section 3.2 of the technical report: Each callback may terminate the optimization via its bool return value, where true indicates that the optimization should be stopped.

So, I think what's happened here is just that the code has gotten out of sync with our original intentions. I think the solution here is to do a couple of things:

I'm happy to do this, although I may skip the extra credit :smile: if someone else wants to jump in and do it feel free---it'll be a little while until I get to it.

@Kaltxi I think @SuvarshaChennareddy has provided a usable workaround, but if you want to get your hands directly dirty, you can modify the DE optimizer in ensmallen_bits/de/de_impl.hpp such that any time Callback::Evaluate() is called, its return value is checked. You can replace, e.g., Callback::Evaluate(...) with terminate |= Callback::Evaluate(...) throughout, I think. Thanks again for the report. :+1:

conradsnicta commented 1 year ago

clarify the documentation to indicate that every callback function has the option of returning a bool

@rcurtin Rather than having an option, I recommend this to be mandatory. Otherwise inside ensmallen we'd need to explicitly handle two cases (one with void returned, and one with bool returned). This unnecessarily complicates the codebase, which in turn leads to higher maintenance burden, increased risk of bugs, etc, all of which are time wasters.

The mandatory requirement is a good way to enforce not paying the opportunity costs in terms of time. For example, having it mandatory will prevent problems like this issue from occurring again (eg. when someone implements a new optimiser).

The mandatory bool return is not an onerous requirement for user code. If the user doesn't want to take advantage of the functionality, their callback function can simply return false in all cases.

zoq commented 1 year ago

I agree with @conradsnicta happy to open a PR with the changes.

DiscreteLogarithm commented 12 months ago

I've also encountered this issue (with another callback method). After looking through the code, I found out that the reason is that the return type of bool form of several callback methods is mistakenly specified as void (likely a copy-paste error). PR #382 should fix this.

When the code check if a bool form Evaluate method is defined in the callback, it checks the corresponding trait. When the signature is wrong, it is always assumed that the bool form method is absent, and therefore it's never called.

EvaluateConstraint, Gradient, and GradientConstraint are also affected by this.

rcurtin commented 11 months ago

With #382, #383, and #384 merged, the example here should work, so I'll go ahead and close the issue. @DiscreteLogarithm and @Kaltxi please feel free to reopen or open a new issue if anything is still wrong. Thanks for the report!