Open TorkelE opened 2 months ago
Gradient-based optimizers in general should be more efficient than derivative-free. You can use this gradient in get_interval (... ; loss_grad)
function with one of the gradient-based local optimizers local_alg
. I didn't have much experience with relevant optimizers available in NLopt
but I would consider usingh :LD_SLSQP
, :LD_CCSAQ
. You can also try :LD_MMA
Thanks, this works. I also checked with another person who suggested :LD_LBFGS
.
I have tried using gradients, however, the result seems wrong? When checking the result using gradients, the found points have identical parameter values to the initial point (except for the one I am computing the profile with). When I do it without gradients, all parameter values are different in the profile end points.
This does not seem to be directly tied to using a gradient-dependant local alg. LN_NELDERMEAD
is fine, however, LD_CCSAQ
, LD_LBFGS
, LD_MMA
, and LD_SLSQP
all exhibit this problem (but I am looking into it so not fully sure yet)
This seems weir, right? I am investigating closer, but also figured I'd ask if it is something that you recognise.
Hm, seems gradient based (starting with LD
) algs fail to move from the initial point. Is it the same model, that you have sent me ?
Yes, it is the same model (although the data points are different, I can try and give you an updated file with the new data points)
I have sent an updated project folder
It's a bit weird but it seems the gradient-based methods need much lower tolerance to work properly. In your example I get the same result with LD_MMA
(as well as LD_SLSQP, LD_LBFGS
) as I get with the default derivative-free LN_NELDERMEAD
when I set scan_tol=1e-6
function loss_grad(p)
grad = zeros(9)
petab_problem.compute_gradient!(grad, p)
return grad
end
conf_int_1 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LN_NELDERMEAD, loss_crit, theta_bounds, scan_bounds)
conf_int_2 = get_interval(start_p, p_idx, f, :CICO_ONE_PASS; local_alg = :LD_LBFGS, loss_grad, scan_tol=1e-6, loss_crit, theta_bounds, scan_bounds)
Thanks, that did work.
However, the runtimes suffer quite bad. E.g. for LD_MMA
computing the example interval takes 12 seconds (and only about 200ms for LN_NELDERMEAD
). E.g. LD_LBFGS
is not as bad, but still about 700 ms (and more then 3x worse than LN_NELDERMEAD
). Shouldn't I expect to be able to gain a speed-up by providing a gradient?
I should not that even for scan_tol = 1e-6
, LN_NELDERMEAD
finishes in ~400 ms (so still faster than the gradient-based methods).
A final note (sorry for all the comments). When I run using a gradient-based method, also supplying a gradient, I get lots of
┌ Warning: autodiff gradient is not available, switching to finite difference mode
└ @ LikelihoodProfiler ~/.julia/packages/LikelihoodProfiler/Qi97K/src/cico_one_pass.jl:67
messages. Exactly what does this mean? I am supplying a gradient, so autodiff should not be relevant?
In theory the gradient-based methods should be faster. With your model I see that the number of likelihood function calls ("right/left CP counter") with LD_LBFGS
is less than with LN_NELDERMEAD
(at least for the parameters I have tested), which means LD_LBFGS
need less likelihood function calls to get to the endpoint. However derivateve-free LN_NELDERMEAD
appear to be faster... It may be due to the simplicity of the model (for more complicated models, the timing comparison may be different) or the way gradients are used and computed in LikelihoodProfiler/PEtab/NLopt
. For LikelihoodProfiler
I can say we didn't have much experience with gradient-based methods and there are plenty things to optimize in the code of the package. Planning to do it soon.
The warning with autodiff
is really surprising if you provide the gradient function. Is it the same model? Can you share the script/function you run ?
The warning are from a large scan of auto-generated data sets on a hpc, so it is non-trivial to create a reproducing MWE. I will have a go through, and report to you when/if I get one
I have created an optimisation problem using PEtab.jl. This gives me a gradient, which I have tried to adapt to LiklihoodProfiler's format using
My impression is that, by default, this gradient is not utilised. What combination of (profiling) method and local algorithm do you recommend for utilising gradients properly?
What kind of advantage can I expect to get if I have a gradient?