tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.26k stars 1.1k forks source link

L-BFGS optimizer does not work for a simple quadratic function #657

Closed VikramRadhakrishnan closed 4 years ago

VikramRadhakrishnan commented 4 years ago

This is a quadratic function in one variable, with a global minimum of -3.333, at x = -0.667. I am having a lot of trouble getting L-BFGS to minimize it. This is my code:

import tensorflow as tf
import tensorflow_probability as tfp
import functools

# Create a wrapper to return function value and gradient
def _make_val_and_grad_fn(value_fn):
    @functools.wraps(value_fn)
    def val_and_grad(x):
        return tfp.math.value_and_gradient(value_fn, x)
    return val_and_grad

# Here is the quadratic function 3x^2 +4x - 2
@_make_val_and_grad_fn
def test_function(x):
  y = 3.0 * tf.math.square(x) + 4.0 * tf.convert_to_tensor(x) - 2.0
  return y

# Initial guess for x
xinit = 10.0

optim_results = tfp.optimizer.lbfgs_minimize(test_function, initial_position=xinit)

with tf.Session() as sess:
  results = sess.run(test_function(optim_results))

This should be fairly straightforward, but I get the error: ValueError: Invalid reduction dimension -1 for input with 0 dimensions. for 'minimize_20/norm/Max' (op: 'Max') with input shapes: [], [] and with computed input tensors: input[1] = <-1>.

jeffpollock9 commented 4 years ago

Looking at the docs it seems that the initial position is "Real Tensor of shape [..., n].". So I think you need to change xinit and then also the function to return a scalar objective value. I guess 1d optimization (i.e. n = 1 using the notation from the doc) is not too common:

Using TF2 I think one way to do this is:

import tensorflow as tf
import tensorflow_probability as tfp
import functools

# Create a wrapper to return function value and gradient
def _make_val_and_grad_fn(fn):
    @functools.wraps(fn)
    def val_and_grad(x):
        return tfp.math.value_and_gradient(value_fn, x)

    return val_and_grad

# Here is the quadratic function 3x^2 +4x - 2
@_make_val_and_grad_fn
def test_function(x):
    y = 3.0 * tf.math.square(x) + 4.0 * tf.convert_to_tensor(x) - 2.0
    return tf.squeeze(y)

# Initial guess for x
xinit = tf.constant([10.0])

optim_results = tfp.optimizer.lbfgs_minimize(test_function, initial_position=xinit)

>>> print(f"min: {optim_results.objective_value}")
min: -3.3333332538604736
>>> print(f"x: {optim_results.position}")
x: [-0.6666667]
VikramRadhakrishnan commented 4 years ago

Thank you, this does indeed work!