scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.91k stars 5.14k forks source link

BUG: optimize.minimize(method='Nelder-Mead') out of tolerance #14929

Closed SGrosse-Holz closed 2 years ago

SGrosse-Holz commented 2 years ago

Describe your issue.

I have a weird issue with minimize(), or I understand the tolerance parameters wrong. The documentation is unclear on this, but I assume the stop condition is fulfilled if the step in x is smaller than xatol and the step in the function value is smaller than fatol. At least that's what some playing around with it suggests. Assuming this about how the tolerance parameters should work, the following behavior is a bug:

It seems that for some combinations of true minimum and initial position, the minimizer aborts too early, as shown in the code example below. Using the commented, non-whole-number value for the true minimum instead makes it run below the specified precision. This only occurs for some values of (x_true, x0), e.g. (1., 5.), (1., 6.), but not (1., 7.).

I'm running iPython (3.7) on Ubuntu 18.04.6 and using scipy-1.7.1.

PS: Is it possible to add some comment as to how the tolerances are intended to work to the docs?

Reproducing Code Example

# x_true = 0.3254894
x_true = 1.
def fun(x): return (x-x_true)**2
res = optimize.minimize(fun, x0=5., method='Nelder-Mead', options={'fatol' : 1e-6, 'xatol' : 1})
print(res)
print()
print(f"f-f_true = {res.fun - fun(x_true)}")
print(f"x-x_true = {res.x - x_true}")

Error message

final_simplex: (array([[1.5],
       [0.5]]), array([0.25, 0.25]))
           fun: 0.25
       message: 'Optimization terminated successfully.'
          nfev: 10
           nit: 5
        status: 0
       success: True
             x: array([1.5])

f-f_true = 0.25
x-x_true = [0.5]

SciPy/NumPy/Python version information

1.7.1 1.21.2 sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)

andyfaff commented 2 years ago

The code for halting is:

if (np.max(np.ravel(np.abs(sim[1:] - sim[0]))) <= xatol and
        np.max(np.abs(fsim[0] - fsim[1:])) <= fatol):

This is comparing the best solution, sim[0], to the rest of the simplex, sim[1:]. It also compares the function values for the best solution, fsim[0], against the rest of the simplex, fsim[1:]. If both those conditions are less than xatol and fatol respectively then the minimisation stops. The simplex entries are made up of previous steps.

the step in x is smaller than xatol and the step in the function value is smaller than fatol

This is correct. Using a callback and return_all we can look at things more closely:

x0 = 5
# x_true = 0.3254894
x_true = 1.

def fun(x):
    return (x-x_true)**2

# lastf = [fun(x0)]

def callback(x):
    print(x, fun(x))

res = minimize(fun, x0=x0, method='Nelder-Mead', callback=callback, options={'fatol':1e-6, 'xatol':1, 'return_all': True})
sim, fsim = res.final_simplex
print(res)
print(sim, fsim)

gives:

[4.5] [12.25]
[3.5] [6.25]
[1.5] [0.25]
[1.5] [0.25]
       allvecs: [array([5.]), array([4.5]), array([3.5]), array([1.5]), array([1.5])]
 final_simplex: (array([[1.5],
       [0.5]]), array([0.25, 0.25]))
           fun: 0.25
       message: 'Optimization terminated successfully.'
          nfev: 10
           nit: 5
        status: 0
       success: True
             x: array([1.5])
[[1.5]
 [0.5]] [0.25 0.25]

You can see from the print out that the final simplex is [[1.5], [0.5]]. The difference between those two is less than or equal to 1, so the xatol condition is satisfied. In addition the final fsim is [0.25, 0.25], the difference between those two is also less than fatol=1e-6.

If you make xatol smaller then the minimisation continues for longer.

SGrosse-Holz commented 2 years ago

I see, this makes a lot of sense. Thanks!

andyfaff commented 2 years ago

Also, one shouldn't be comparing x or f(x) to a known global minimum; this is because one typically doesn't know what f_true is, so it doesn't make a sensible stopping criterion.

SGrosse-Holz commented 2 years ago

Sure, but it serves as a measure of performance. I was hoping that by adjusting ftol I can steer the algorithm to converge to a point roughly O(ftol) away from the true minimum (let it be 5*ftol or anything, sure). I now see that this is not generally true, but will probably work in a practical setting where the coincidence of all vertices being within ftol of each other while fitres.fun - f_true >> ftol should be rare.