Implement VectorFreeOWLQN class, extends VectorFreeLBFGS
Reference paper: Andrew and Gao (2007) Scalable Training of L1-Regularized Log-Linear Models
override the following methods:
chooseDescentDirection: override this because OWLQN need correct the direction computed by LBFGS.
takeStep: OWLQN need to restrict the stepping coeffs keeping in the same orthant. Override takeStep to do this.
adjust: The adjust interface is used for OWLQN L1 reg computation. Note that OWLQN need the original gradient(without L1 reg) for approximate Hessian computing, but need adjusted gradient(with L1 reg) for OWLQN descent direction computing. That's why I don't computing gradient with L1 reg in DiffFunction interface. This design is similar to the one in breeze lib. The gradient with L1 reg is called pseudo gradient or sub-gradient, the details is in the paper .
determineStepSize: OWLQN use its own LineSearch DiffFunction, in the OWLQN line search difffunction, OWLQN requires using its override takeStep to computing new coeffs. And in VectorFreeOWLQN I use BacktrackingLineSearch, which keep consistent with the one in breeze lib.
Implement Vector free OWLQN optimizer.
Implement
VectorFreeOWLQN
class, extendsVectorFreeLBFGS
Reference paper: Andrew and Gao (2007) Scalable Training of L1-Regularized Log-Linear Modelsoverride the following methods:
takeStep
to do this.DiffFunction
interface. This design is similar to the one in breeze lib. The gradient with L1 reg is called pseudo gradient or sub-gradient, the details is in the paper .LineSearch
DiffFunction, in the OWLQN line search difffunction, OWLQN requires using its overridetakeStep
to computing new coeffs. And inVectorFreeOWLQN
I useBacktrackingLineSearch
, which keep consistent with the one in breeze lib.