bfgs_minimize "failing" on a rosenbrock example with no reason why

jeffpollock9 commented 5 years ago

In the following example bfgs_minimize reports a failure to converge when it is very close to the solution:

import tensorflow as tf
import tensorflow_probability as tfp

A = tf.constant(3.0)
B = tf.constant(100.0)

def rosenbrock(x):
    x0 = x[0]
    x1 = x[1]
    first = tf.math.squared_difference(A, x0)
    second = B * tf.math.squared_difference(x1, tf.square(x0))
    return first + second

def value_and_gradients(x):
    return tfp.math.value_and_gradient(rosenbrock, x)

@tf.function
def hessian(opt):
    x = opt.position
    value = rosenbrock(x)
    hess = tf.hessians(value, x)[0]
    return hess

def approx_hessian(opt):
    inverse_hess = opt.inverse_hessian_estimate
    hess = tf.linalg.inv(inverse_hess)
    return hess

tf.random.set_seed(42)

init = tf.random.normal([2])
opt = tfp.optimizer.bfgs_minimize(value_and_gradients, init)

print(f"initial position: {init}")
print(f"true solution: [{A}, {A*A}]")

print(f"found solution: {opt.position}")
print(f"converged: {opt.converged}")
print(f"num_iterations: {opt.num_iterations}")

print(f"hessian at solution:\n{hessian(opt)}")
print(f"approx hessian at solution:\n{approx_hessian(opt)}")

(@tf.function is only used here to make calculating the hessian easier).

which outputs:

(tf2) $ python bfgs_fail.py 
initial position: [ 0.3274685 -0.8426258]
true solution: [3.0, 9.0]
found solution: [3.0000024 9.000014 ]
converged: False
num_iterations: 34
hessian at solution:
[[ 7202.0117 -1200.001 ]
 [-1200.001    200.    ]]
approx hessian at solution:
[[ 7280.896   -1213.2786 ]
 [-1213.2787    202.23483]]

using the latest tensorflow 2 alpha and tfp nightly from pip.

Would it be possible to also output why the routine has failed? Has it actually failed?

If it is at all useful, the LBFGS routine gets the right answer and reports it accordingly.

I am very happy to devote some time to helping with this if I could get a few pointers to start me off. Thanks!

csuter commented 5 years ago

Looks like the default tolerance https://github.com/tensorflow/probability/blob/46b0b89821921b1b5bef163d3dba355c67bc8209/tensorflow_probability/python/optimizer/bfgs.py#L75 is 1e-8, and your logs show it is only within about 1e-6.

On Tue, Mar 26, 2019 at 11:13 AM Jeff notifications@github.com wrote:

In the following example bfgs_minimize reports a failure to converge when it is very close to the solution:

import tensorflow as tfimport tensorflow_probability as tfp

A = tf.constant(3.0) B = tf.constant(100.0)

def rosenbrock(x): x0 = x[0] x1 = x[1] first = tf.math.squared_difference(A, x0) second = B * tf.math.squared_difference(x1, tf.square(x0)) return first + second

def value_and_gradients(x): return tfp.math.value_and_gradient(rosenbrock, x)

@tf.functiondef hessian(opt): x = opt.position value = rosenbrock(x) hess = tf.hessians(value, x)[0] return hess

def approx_hessian(opt): inverse_hess = opt.inverse_hessian_estimate hess = tf.linalg.inv(inverse_hess) return hess

tf.random.set_seed(42)

init = tf.random.normal([2]) opt = tfp.optimizer.bfgs_minimize(value_and_gradients, init) print(f"initial position: {init}")print(f"true solution: [{A}, {A*A}]") print(f"found solution: {opt.position}")print(f"converged: {opt.converged}")print(f"num_iterations: {opt.num_iterations}") print(f"hessian at solution:\n{hessian(opt)}")print(f"approx hessian at solution:\n{approx_hessian(opt)}")

(@tf.function is only used here to make calculating the hessian easier).

which outputs:

(tf2) $ python bfgs_fail.py initial position: [ 0.3274685 -0.8426258] true solution: [3.0, 9.0] found solution: [3.0000024 9.000014 ] converged: False num_iterations: 34 hessian at solution: [[ 7202.0117 -1200.001 ] [-1200.001 200. ]] approx hessian at solution: [[ 7280.896 -1213.2786 ] [-1213.2787 202.23483]]

using the latest tensorflow 2 alpha and tfp nightly from pip.

Would it be possible to also output why the routine has failed? Has it actually failed?

If it is at all useful, the LBFGS routine gets the right answer and reports it accordingly.

I am very happy to devote some time to helping with this if I could get a few pointers to start me off. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/341, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJtGXfMpeuqD5-Y1N9sZGNufSW_y_ohks5vajkEgaJpZM4cLtKp .

-- Christopher Suter | SWE | cgs@google.com | 352-234-4096

csuter commented 5 years ago

(actually I think I took the wrong meaning of tolerance in my reply. That tolerance parameter is renamed grad_tolerance in the downstream computations. In any case, I think that or the other tolerance values are just too tight for it to conclude convergence.

On Tue, Mar 26, 2019 at 11:13 AM Jeff notifications@github.com wrote:

In the following example bfgs_minimize reports a failure to converge when it is very close to the solution:

import tensorflow as tfimport tensorflow_probability as tfp

A = tf.constant(3.0) B = tf.constant(100.0)

def rosenbrock(x): x0 = x[0] x1 = x[1] first = tf.math.squared_difference(A, x0) second = B * tf.math.squared_difference(x1, tf.square(x0)) return first + second

def value_and_gradients(x): return tfp.math.value_and_gradient(rosenbrock, x)

@tf.functiondef hessian(opt): x = opt.position value = rosenbrock(x) hess = tf.hessians(value, x)[0] return hess

def approx_hessian(opt): inverse_hess = opt.inverse_hessian_estimate hess = tf.linalg.inv(inverse_hess) return hess

tf.random.set_seed(42)

init = tf.random.normal([2]) opt = tfp.optimizer.bfgs_minimize(value_and_gradients, init) print(f"initial position: {init}")print(f"true solution: [{A}, {A*A}]") print(f"found solution: {opt.position}")print(f"converged: {opt.converged}")print(f"num_iterations: {opt.num_iterations}") print(f"hessian at solution:\n{hessian(opt)}")print(f"approx hessian at solution:\n{approx_hessian(opt)}")

(@tf.function is only used here to make calculating the hessian easier).

which outputs:

(tf2) $ python bfgs_fail.py initial position: [ 0.3274685 -0.8426258] true solution: [3.0, 9.0] found solution: [3.0000024 9.000014 ] converged: False num_iterations: 34 hessian at solution: [[ 7202.0117 -1200.001 ] [-1200.001 200. ]] approx hessian at solution: [[ 7280.896 -1213.2786 ] [-1213.2787 202.23483]]

using the latest tensorflow 2 alpha and tfp nightly from pip.

Would it be possible to also output why the routine has failed? Has it actually failed?

If it is at all useful, the LBFGS routine gets the right answer and reports it accordingly.

I am very happy to devote some time to helping with this if I could get a few pointers to start me off. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/341, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJtGXfMpeuqD5-Y1N9sZGNufSW_y_ohks5vajkEgaJpZM4cLtKp .

-- Christopher Suter | SWE | cgs@google.com | 352-234-4096

csuter commented 5 years ago

Wrong again! Printing opt.failed yields "True". The docstring for this field reads

  failed:  boolean tensor of shape `[...]` indicating for each

batch member whether a line search step failed to find a suitable step size satisfying Wolfe conditions. In the absence of any constraints on the number of objective evaluations permitted, this value will be the complement of converged. However, if there is a constraint and the search stopped due to available evaluations being exhausted, both failed and converged will be simultaneously False.

So it sounds like the line search failed at step 34. The line search algorithm is Hager-Zhang This is pushing towards the boundaries of how well I understand the optimizer code... :) Maybe someone else from the team can chime in. @SiegeLordEx ?

jeffpollock9 commented 5 years ago

@csuter thanks for the comments. I see now that the reason of failure is actually returned (and it is documented) so apologies for not realising, although I do wonder if line_search_failed (or similar) would be a better name than failed. I would be happy to send over a brief pull request with that change to the whole optimizer module if that is correct and it would be useful?

Looking at the line search code it seems sensitive to the float type (see _machine_eps) so I tried changing tf.float32 to tf.float64 and it now works (as in opt.converged = True and the solution is spot on) in 44 iterations!

Anyway, I can close this issue as that is all cleared now, thanks again, and let me know if the pull request would be useful.

csuter commented 5 years ago

Ah, great, glad to hear you're unblocked!

I'd personally be open to the name change but it would be a backward incompatible API change. This may have implications for existing users. Please feel free to file an issue and we can look into whether and how to migrate the name.

tensorflow / probability

bfgs_minimize "failing" on a rosenbrock example with no reason why #341