Open sub7erra opened 5 years ago
Sorry for not responding. The lost you're reporting is different from what OWLQN is trying to minimize:
1) we're minimizing your objective plus an L1 term. 2) We're running a line search that's seeking not just a steepest descent step length, but one that enforces the strong wolfe conditions, which put various constraints on the derivatives as well as on the obj values. I honestly don't remember that stuff well enough to know if that's what's happening here, and it is unfortunate we don't log more here.
If you can replicate with LBFGS or with L1 = 0 I'll bite, or convince me why I'm wrong otherwise (I easily could be!)
When executing optimization tasks with OWLQN class one may notice pretty often that line search step iterates through several points evaluating function at each of them, and then selects an "optimal"
alpha
value which is not the best of the values that have just been evaluated. This can be noticed by just looking at cost function values, usually it happens like this:See that cost function value on
Iteration 1
is the last one evaluated by line search onIteration 0
, while a better point was the previous one.The happens because
BacktrackingLineSearch
which is used inOWLQN
relies on the implementation of theminimize
method fromApproximateLineSearch
trait which is as follows:https://github.com/scalanlp/breeze/blob/aeb3360ce77d79490fe48bc5fe2bb40f18289908/math/src/main/scala/breeze/optimize/LineSearch.scala#L26
This is clearly incorrect as no one said that the last iteration of the line search contains the best step size. In fact, in like half of the cases it doesn't because the search might just overshoot (or undershoot) on the last try.
An obvious solution would be to track cost function values for those alphas we have tried and, after stopping conditions are met, select the best instead of the last of those.