Comparison with Pythons scipy.optimize lbfgsb

izsahara commented 2 years ago

My problem is with optimizing the log marginal likelihood of a Gaussian Process. I was wondering if there was any comparison done between LBFGSpp and scipys lbfgsb. Or perhaps, how do we adjust the settings of LBFGSpp such that its performance is 'like' that of scipys implementation. My results always indicate scipys version better overall. I wonder what the difference is? as LBGFSpp is necessary for my project

Thanks

yixuan commented 2 years ago

Hi @izsahara, do you have any test cases to compare the results?

cschreib-ibex commented 1 year ago

I just went through this trouble myself, trying to reproduce the scipy implementation as close as possible. Here are the parameters that (I think) map to the scipy defaults:

LBFGSBParam<double> params;

params.epsilon = 1e-5;
params.epsilon_rel = 0.0;
params.delta = 2.22045e-9;
params.past = 1;

params.m = 10;
params.max_iterations = 15000;

params.max_linesearch = 20;
// Less sure about these ones:
params.ftol = 1e-3;
params.wolfe = 0.9;
params.min_step = 0.0;
params.max_step = std::numeric_limits<double>::infinity();
// Edit: update this to zero following comments below.
params.max_submin = 0;

There's more to that, though. The Fortran code used by scipy doesn't generate an error when reaching the maximum number of steps in the line search; it just reverts to some previous state which, as far as I can piece together by reading their code, is the state at the start of the line search. So, I think, it means it will just stop as converged, rather than error. Does that seem right to you?

So I went into LineSearchMoreThuente.h and did that, at the very end of the function:

        if (iter >= param.max_linesearch)
        {
            // Original code:
            // throw std::runtime_error("the line search routine reached the maximum number of iterations");

            // Behavior closer to scipy: restore input state.
            x = xp;
            fx = fx_init;
            dg = dg_init;
            grad = grad_init;
            step = Scalar(0);
        }

(you'll need to store a copy grad_init of the input gradient at the start, as well)

I'd be interested in hearing your opinion @yixuan, since you must know the code and algorithm better than any of us.

yixuan commented 1 year ago

Thanks @cschreib-ibex for the careful investigation. Indeed, these parameters can somehow be mapped to the Scipy default, but from my own experience, it is pretty hard to exactly reproduce the backend Fortran code. The difference is most likely to happen in the line search routine.

For the params.max_submin parameter, it is something I'm experimenting to improve the L-BFGS-B algorithm, and you can set it to zero to better match the Scipy implementation.

cschreib-ibex commented 1 year ago

Thank you, will do that! What about the "max linesearch" behaviour; do you think it makes sense to revert to initial values rather than error? This appears to be a common occurence in our data sets when the algorithm is close to converging.

yixuan commented 1 year ago

Yeah, I'm thinking of how to refactor the code and properly return from the line search routine. I'll update here once I have any progress.

yixuan commented 1 year ago

This commit has removed major runtime exceptions when maximum number of line searches is reached, which should partly resolve the "max line search" issue.

There remain some exceptions in the code, which are mostly "serious" issues that are hard to proceed.

Let me first mark this as closed. Feel free to reopen if further issues occur.

yixuan / LBFGSpp

Comparison with Pythons scipy.optimize lbfgsb #22