Minimize.scalar_minimize does not calculate parameter uncertainties/correlations.

andyfaff commented 9 years ago

When using leastsq the parameter uncertainties and correlations are estimated. This is because the underlying Hessian/Correlation matrix is estimated by scipy.optimize.leastsq.
However, the scipy.optimize scalar minimizers do not do this. As a consequence lmfit.Minimize.minimize does not calculate uncertainties/correlations if you are not using leastsq. IMHO this is a deficiency that needs to be rectified. There are at least two ways around this. 1) Do a leastsq fit after a scalar_minimize (this should be done in lmfit.Minimize.minimize). Here you want to be sure that the subsequent leastsq fit respects the (min, max) bounds, and that the chi2 after the subsequent leastsq fit is lower than the original fit. 2) Estimate the Hessian. This can be done in a relatively simple fashion, see estimate_mrqcof in https://github.com/andyfaff/pyplatypus/blob/master/pyplatypus/analysis/curvefitter.py.

Further methods include MonteCarlo methods, but that would introduce new dependencies.

newville commented 9 years ago

@andyfaff Least-squares require the objective function to return an array, with at least as many observations as variables (npoints >= nvarys). The scalar minimizers don't require this -- they are not designed solely for curve-fitting.

When one does have multiple observations, one can easily use least-squares after a scalar minimizer, by changing only the fitting method. This is a good strategy for many challenging cases. Lmfit allows the objective function for scalar minimizers to return an array (and uses (array**2).sum()) to make this sort of thing easy to do. That does not require chi-square to be improved for leastsq compared to previous result (wouldn't 'slightly worse' be acceptable?), but I think that this approach is usually pretty stable.

Whether this is done automatically or not is debatable. Since it cannot be done in all cases (one would needs more observations than variables), lmfit.minimize would still not always try to calculate uncertainties (and some cases will simply fail anyway). Also, I think it's completely reasonable that a user would want to know the best-fit values from method='nelder' (or other) before they might be changed by method='leastsq'.

Estimating the curvature matrix for a scalar minimizer by brute force is an interesting idea. Maybe I am not understanding your pyplatypus code, but it looks to me like you are calculating just the 1st derivatives, not the 2nd partial derivatives, d2f/dxdx'. Is that right? Maybe I'm just being dense....

Adding an option to use a Monte Carlo method from PyMC (or something else?) would be great. I think @Tillsten has made similar suggestions. I don't think we'd want to have this done automatically, but it would definitely be useful!

Tillsten commented 9 years ago

@newville irc LM also only calculates the first derivatives and than approximate the Hessian by df/dx * df/dy.

newville commented 9 years ago

@Tillsten: right you are and thanks (and sorry I got so far behind on the conversation).

@andyfaff: is it worth including the brute-force curvature calc for the scalar minimizers independent of the MCMC work? I think it would be, though the assumption that the quantity to minimized is chi-square would probably need to be documented.

andyfaff commented 9 years ago

The original repo for the demo code I had has changed locations. The mrq_cof function has also disappeared, I'd have to go back and find it.

I would like to see parameter uncertainties estimated with the scalar_minimizers. AFAIK estimating the Hessian by this method is going to be the simplest, although as you say one can always do a leastsq fit afterwards. I should think it should be orthogonal to the MCMC work.

newville commented 9 years ago

@andyfaff I agree this should be orthogonal to MCMC, and that both methods are valuable. If you're interested and willing/able to dig up the code you had for pyplatypus, and add it to lmfit, I think that would be used by very many people.

Tillsten commented 9 years ago

There is a numdiff-tools package to calculate gradients and hessians of functions. There was some talk to add it to scipy, but i don't think it happened. While i used it, I never looked into the code itself, so it may or may not be possible to just lift the relevant part out of it.

reneeotten commented 6 years ago

@newville is there still interest in this? I am willing to give it a try if there is.

I was recently looking at this topic and saw that there is (again) some talk that this might be added to scipy, but I have no idea what their timeframe is. The numdifftools package seems useful but would add another dependency; I noticed however that @andyfaff incorporated some code from numdifftools in one of his packages for this purpose and that could probably be used as a starting point for adding this tolmfit as well.

newville commented 6 years ago

@reneeotten I do think it is still interesting. Perhaps numdifftools (I think maybe statsmodel has this too?) could work to get gradients for objective functions that return scalars. I have not looked into this enough to know whether adding a dependency or swiping code is better... Anyway, yes it could be very useful!

andyfaff commented 6 years ago

I've used code from scipy, approx_hess2, to estimate the Hessian. You could vendor that with the appropriate recognition. However, I sometimes have problems with parameters that aren't of similar scale, so would appreciate it if you got further than I have.

On 30 November 2017 at 14:34, Matt Newville notifications@github.com wrote:

@reneeotten https://github.com/reneeotten I do think it is still interesting. Perhaps numdifftools (I think maybe statsmodel has this too?) could work to get gradients for objective functions that return scalars. I have not looked into this enough to know whether adding a dependency or swiping code is better... Anyway, yes it could be very useful!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lmfit/lmfit-py/issues/169#issuecomment-348072795, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq51ndaII6uyiecShIu2PCeSz0DNqeSks5s7iI5gaJpZM4DK-iv .

--

Dr. Andrew Nelson

reneeotten commented 6 years ago

@newville okay, after some delay I started looking into this again... It seems to work with calculating the Hessian with numdifftools and then from there getting the covariance matrix (i.e., np.linalg.inv(hessian) * 2.0). That is, it gives the correct answers when comparing to leastsq when there are no upper/lower bounds set for the parameters. What I haven't figured out yet is how to transform it correctly when bounds are used, as is also done in the leastsq function. There it uses the output from scipy.leastsq (i.e., fjac and ipvt) and re-scales the gradient and then calculates the covariance matrix, if I understand it correctly.

Admittedly, my matrix algebra is a little rusty and I can certainly study this some more in the weekend. I just wanted to quickly check if you have some ideas and/or pointers. Is there, for example, a relatively easy way to re-scale the covariance matrix directly or does one have to go through the gradients? I also don't really understand what fjac is, that seems a NxM matrix where N = # parameters and M = # data points. If I try to calculate a Jacobian with numdifftools, I only get vector of Nx1. Anyway, there is some progress at least.

newville commented 6 years ago

@reneeotten Great!

For fixing the covariance in the presence of bounds-transformed Parameters, I would first ask: are you using numdifftools on the internal, free parameters or on the users external parameters. If it is on the external parameters, it might just work the same for bounded and unbounded. But, since you're saying it does work in one case, you probably mean it does not work for bounded variables.....

The ipvt array from MINPACK / leastsq is an index array used to reorder the variables. That is, MINPACK will reorder the variables internally (I think to improve the stability of the QR factorization). You can probably just replace the take(xxx, ipvt - 1) withxxx`.

My LinAlg must be kinda rusty too. But, looking at this, it looks like maybe

infodict['fjac'] = transpose(transpose(infodict['fjac']) / take(grad, infodict['ipvt'] - 1))
rvec = dot(triu(transpose(infodict['fjac'])[:nvars, :]),   take(eye(nvars), infodict['ipvt'] - 1, 0))

could be simplified because I believe that

take(eye(nvars), infodict['ipvt'] - 1, 0))  == eye(nvars)

and isn't

dot(matrixNxM, eye(N))

just trace (Edit: diag)? Maybe I'm missing something but I'll play around with this....

Anyway, I think that what you really need is fjac (and the scaled gradients from Parameter.scaled_gradient). That's all MINPACK really returns anyway....

newville commented 6 years ago

ok, it looks like all the comments I made about simplifying the use of ipvt there are completely wrong. Basically, take(eye(nvars), infodict['ipvt'] - 1, 0)) != eye(nvars), but a reordered matrix.

reneeotten commented 6 years ago

@newville it indeed doesn't work right now for bounded parameters. I'm using (I think) the external parameter values and give those to numdifftools, but that distinction is a good point. I understand that I could follow the same recipe as in leastsq, but for that I would need the equivalent of fjac from numdifftools and I am not sure yet how to do that. I do get a correct covariance matrix out, so if one need to use the gradients from Parameter.scaled_gradient I just have to figure out how to get fjac from my covariance matrix...

I will spend some more time on it myself and then submit a PR with the code I have - that probably makes it easier to discuss.

newville commented 6 years ago

@reneeotten Maybe just compute the Jacobian by hand, as at https://github.com/lmfit/lmfit-py/blob/master/lmfit/model.py#L1327 ? That could then work on external values, and you don't need the step of scaling by Parameter.scaled_gradient. ... I think...

newville commented 5 years ago

I think this is now addressed with #506!

lmfit / lmfit-py

Minimize.scalar_minimize does not calculate parameter uncertainties/correlations. #169