Recommended approach for utilising gradients

insysbio / LikelihoodProfiler.jl

LikelihoodProfiler is a Julia package for practical identifiability analysis and confidence intervals evaluation.

https://insysbio.github.io/LikelihoodProfiler.jl/latest

MIT License

16 stars 4 forks source link

Recommended approach for utilising gradients #12

Open TorkelE opened 2 months ago

TorkelE commented 2 months ago

I have created an optimisation problem using PEtab.jl. This gives me a gradient, which I have tried to adapt to LiklihoodProfiler's format using

function loss_grad(p)
    grad = zeros(9)
    opt_prob2_3.compute_gradient!(grad, p)
    return grad
end

My impression is that, by default, this gradient is not utilised. What combination of (profiling) method and local algorithm do you recommend for utilising gradients properly?

What kind of advantage can I expect to get if I have a gradient?

ivborissov commented 2 months ago

Gradient-based optimizers in general should be more efficient than derivative-free. You can use this gradient in get_interval (... ; loss_grad) function with one of the gradient-based local optimizers local_alg. I didn't have much experience with relevant optimizers available in NLopt but I would consider usingh :LD_SLSQP, :LD_CCSAQ. You can also try :LD_MMA

TorkelE commented 2 months ago

Thanks, this works. I also checked with another person who suggested :LD_LBFGS.

I have tried using gradients, however, the result seems wrong? When checking the result using gradients, the found points have identical parameter values to the initial point (except for the one I am computing the profile with). When I do it without gradients, all parameter values are different in the profile end points.

This does not seem to be directly tied to using a gradient-dependant local alg. LN_NELDERMEAD is fine, however, LD_CCSAQ, LD_LBFGS, LD_MMA, and LD_SLSQP all exhibit this problem (but I am looking into it so not fully sure yet)

This seems weir, right? I am investigating closer, but also figured I'd ask if it is something that you recognise.

ivborissov commented 2 months ago

Hm, seems gradient based (starting with LD) algs fail to move from the initial point. Is it the same model, that you have sent me ?

TorkelE commented 2 months ago

Yes, it is the same model (although the data points are different, I can try and give you an updated file with the new data points)

TorkelE commented 2 months ago

I have sent an updated project folder

ivborissov commented 2 months ago

It's a bit weird but it seems the gradient-based methods need much lower tolerance to work properly. In your example I get the same result with LD_MMA (as well as LD_SLSQP, LD_LBFGS ) as I get with the default derivative-free LN_NELDERMEAD when I set scan_tol=1e-6

function loss_grad(p)
  grad = zeros(9)
  petab_problem.compute_gradient!(grad, p)
  return grad
end

conf_int_1 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LN_NELDERMEAD, loss_crit, theta_bounds, scan_bounds) 
conf_int_2 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LD_LBFGS, loss_grad, scan_tol=1e-6, loss_crit, theta_bounds, scan_bounds)

TorkelE commented 2 months ago

Thanks, that did work.

However, the runtimes suffer quite bad. E.g. for LD_MMA computing the example interval takes 12 seconds (and only about 200ms for LN_NELDERMEAD). E.g. LD_LBFGS is not as bad, but still about 700 ms (and more then 3x worse than LN_NELDERMEAD). Shouldn't I expect to be able to gain a speed-up by providing a gradient?

TorkelE commented 2 months ago

I should not that even for scan_tol = 1e-6, LN_NELDERMEAD finishes in ~400 ms (so still faster than the gradient-based methods).

TorkelE commented 2 months ago

A final note (sorry for all the comments). When I run using a gradient-based method, also supplying a gradient, I get lots of

┌ Warning: autodiff gradient is not available, switching to finite difference mode
└ @ LikelihoodProfiler ~/.julia/packages/LikelihoodProfiler/Qi97K/src/cico_one_pass.jl:67

messages. Exactly what does this mean? I am supplying a gradient, so autodiff should not be relevant?

ivborissov commented 2 months ago

In theory the gradient-based methods should be faster. With your model I see that the number of likelihood function calls ("right/left CP counter") with LD_LBFGS is less than with LN_NELDERMEAD (at least for the parameters I have tested), which means LD_LBFGS need less likelihood function calls to get to the endpoint. However derivateve-free LN_NELDERMEAD appear to be faster... It may be due to the simplicity of the model (for more complicated models, the timing comparison may be different) or the way gradients are used and computed in LikelihoodProfiler/PEtab/NLopt. For LikelihoodProfiler I can say we didn't have much experience with gradient-based methods and there are plenty things to optimize in the code of the package. Planning to do it soon.

The warning with autodiff is really surprising if you provide the gradient function. Is it the same model? Can you share the script/function you run ?

TorkelE commented 2 months ago

The warning are from a large scan of auto-generated data sets on a hpc, so it is non-trivial to create a reproducing MWE. I will have a go through, and report to you when/if I get one