Using nan_policy=omit causes exception in scipy.optimize.leastsq

campagnola commented 6 years ago

Description

I have recently begun to encounter this error message during fitting:

  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 873, in fit
    output.fit(data=data, weights=weights)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 1217, in fit
    _ret = self.minimize(method=self.method)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1808, in minimize
    return function(**kwargs)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1296, in leastsq
    lsout = scipy_leastsq(self.__residual, variables, **lskws)
  File "/home/luke/.local/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 394, in leastsq
    gtol, maxfev, epsfcn, factor, diag)
ValueError: The array returned by a function changed size between calls

It only occurs when using nan_policy='omit', and then only when there are non-finite values generated by the model function. I found another project with a similar issue: https://github.com/MDAnalysis/mdanalysis/issues/1848 that suggests it's caused by a behavior change in scipy.

Version information

lmfit: 0.9.10, scipy: 1.1.0, numpy: 1.14.4, asteval: 0.9.12, uncertainties: 3.0.2, six: 1.11.0

A Minimal, Complete, and Verifiable example

import lmfit
import numpy as np

def model_fn(x, yoffset):
    """A model where the number of non-finite values depends on a parameter."""
    y = np.log(yoffset - x)
    print("model_fn(%f) contains %d non-finite values" % (yoffset, np.sum(~np.isfinite(y))))
    return y

model = lmfit.Model(model_fn)
x = np.linspace(-5, 5, 101)
y = model.eval(x=x, yoffset=10)
model.fit(y, x=x, yoffset=5, nan_policy='omit').best_values

When I run the code above, I get:

>>> model.fit(y, x=x, yoffset=5, nan_policy='omit').best_values
model_fn(5.000000) contains 1 non-finite values
model_fn(5.000000) contains 1 non-finite values
model_fn(5.000000) contains 1 non-finite values
model_fn(5.000000) contains 1 non-finite values
model_fn(5.000000) contains 0 non-finite values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 873, in fit
    output.fit(data=data, weights=weights)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 1217, in fit
    _ret = self.minimize(method=self.method)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1808, in minimize
    return function(**kwargs)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1296, in leastsq
    lsout = scipy_leastsq(self.__residual, variables, **lskws)
  File "/home/luke/.local/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 394, in leastsq
    gtol, maxfev, epsfcn, factor, diag)
ValueError: The array returned by a function changed size between calls

As a side note, running with a really bad initial guess produces a familiar but unhelpful error message; it'd be nice to catch this before it gets to leastsq:

>>> model.fit(y, x=x, yoffset=-6, nan_policy='omit').best_values
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 873, in fit
    output.fit(data=data, weights=weights)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/model.py", line 1217, in fit
    _ret = self.minimize(method=self.method)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1808, in minimize
    return function(**kwargs)
  File "/home/luke/.local/lib/python2.7/site-packages/lmfit/minimizer.py", line 1296, in leastsq
    lsout = scipy_leastsq(self.__residual, variables, **lskws)
  File "/home/luke/.local/lib/python2.7/site-packages/scipy/optimize/minpack.py", line 386, in leastsq
    raise TypeError('Improper input: N=%s must not exceed M=%s' % (n, m))
TypeError: Improper input: N=1 must not exceed M=0

newville commented 6 years ago

@campagnola It is always kind of astonishing to me when someone submits a question as a Github issue. We have CONTRIBUTING.md, and ISSUE_TEMPLATE.md, and a link on the main documentation page labelled "Getting Help" (https://lmfit.github.io/lmfit-py/support.html) that are hard to miss or misinterpret. I would have been about 100x more sympathetic and willing to help if you had asked this as a quesion on the mailing list.

You are clearly not a troll, are familar with the scipy stack, with programming, and are literate. I don't understand how someone could write such an Issue report and not understand the answer to the issue. What do you think nan_policy='omit' should do?

campagnola commented 6 years ago

Sorry for the trouble @newville, I don't consider my report above to be a question; I am not asking for advice. I am suggesting that lmfit has acquired some unexpected behavior as a result of an upstream scipy change, and it would be reasonable to make some minor code changes to avoid this unexpected behavior in the future (the causes and solutions to this issue may be immediately apparent to you, but it incurred a significant time investment for me that could be avoided for other lmfit users).

A simple solution would be to deprecate the use of nan_policy='omit' with method='leastsq', perhaps with a helpful error message suggesting other minimization methods.

newville commented 6 years ago

@campagnola I don't know if anything changed with scipy on this. For sure, leastsq does not work if the length of the residual changes. Setting nan_policy='omit' and having nans generated by the objective or model function will cause that, especially if some range of parameter values causes nans, as your example does. Basically, an objective or model function that generates nans will always cause trouble.

OTOH nan_policy='omit' can be useful in cases where there are nans in the data. This is not that uncommon, and will generate a fixed number of nans. We don't want to deprecate that use-case.

If you want nicer(?) error messages of 'ValueError: the input contains nan values', you would use nan_policy='raise', which happens to be the default. That's still raising a ValueError. In some cases, using nan_policy='propagate' may "succeed" in the sense of not raising an exception, but in those cases, the fit will likely stop as soon as a nan is encountered.

I do not see what, if anything, needs fixing here.

campagnola commented 6 years ago

I see your point regarding model functions that produce nan versus data containing nan; deprecation is not an option.

In my case, I have model functions that produce variable numbers of nan/inf depending on the parameters and the range of independent variable values. Ideally I would bound the parameters to avoid encountering non-finite values, but often the boundaries are nonlinear and difficult to solve analytically, so I don't think this is avoidable for me. Instead, I have relied on nan_policy='omit' to simply mask out the non-finite values before they reach the minimizer, and this has worked well up until I upgraded scipy from 1.0 to 1.1 (I have confirmed that the behavior did change between those versions). The workaround--using one of the scalar minimization methods (like nelder) instead of the equation minimizers (like leastsq)--is simple but not obvious unless you happen to have a clear understanding about the internal relationship between lmfit, scipy, and the various minimization methods.

If this is not a situation that you think other users may run into, or if you think lmfit users should understand the lmfit/scipy internals at least well enough to avoid this trap, then let's just close the issue.

If you think it's worth trying to help users avoid the trap, then let's come up with other ways to provide useful feedback to the user. Another possibility is to catch the scipy exception and re-raise with a more helpful error message (the second exception I posted above could be addressed in the same way). I'd be happy to propose a PR if you're interested.

reneeotten commented 6 years ago

it's true the behavior for your example indeed changed with a newer version of scipy, a quick test suggests that it's actually version 1.0.1. I haven't looked carefully, but perhaps the changes between 1.0 and 1.0.1 are small enough to figure out what exactly changed and why...

It seems to me that changing the "array size between function calls" is typically a good enough reason to halt the fit, so I am not sure whether the old behavior is actually correct/better. The TypeError: Improper input: N=1 must not exceed M=0 is indeed a bit obscure, but the lmfit documentation states quite clearly that for leastsq, the returned value must be an array, with a length greater than or equal to the number of fitting variables in the model (also the scipy documentation does). We could probably mention this error message somewhere in the documentation or FAQs so that people realize what the error actually means. Alternatively, reraising would be an option - I don't know what would be more helpful for the user.

newville commented 6 years ago

@campagnola the lmfit user will know at least some of numpy and the scipy stack, even if they are using the simplest interface with Model. We don't expect people to understand scipy internals, but they should be able to learn how to read a traceback. In the example you give, if the nan_policy is left as the default value of raise, then the user would get

ValueError: The input contains nan values

I don't know how to phrase that better. You explicitly overwrote that default and now the nans you were warned about and tried to ignore cause a different exception. OK... you should not ignore those nans. Yeah, if you're taking the log() or sqrt() the argument cannot have a negative value.

A nan in the residual essentially dooms the whole concept -- if chi-square is nan, what are you improving? It makes no sense for certain values for some parameter to change the size of the residual array. These should cause exceptions.

I don't see anything obvious in the scipy release notes about what would have changed in the way optimize.leastsq handles nan. If some version of scipy was not raising a VauleError on nans in a residual array, I'm glad they fixed it.

It continues to bother me that this discussion is not on the mailing list.

lmfit / lmfit-py