tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.26k stars 1.1k forks source link

LBFGS not working with tensorflow 2.0 #398

Open Lescurel opened 5 years ago

Lescurel commented 5 years ago

System information

Describe the current behavior

When trying to use the tfp.optimizer.lbfgs_minimize function, I get an error, :

InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape.  Input 0: [1,2] != input 1: [] [Op:Select]

Describe the expected behavior

This should run without issue, as it works under TensorFlow 1.13.1 and TensorFlow Probability 0.6.0, with tf.enable_eager_execution()

Code to reproduce the issue The following code runs under TF 1.13.1 with TFP 0.6.0, but not with TF 2.0 with TFP 0.7.0

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

class TestEager():
    def __init__(self):
        # tf.losses.mean_squarred_error is not the same under TF 2.0
        self.mse = tf.losses.mean_squared_error
        if tf.__version__ == '2.0.0-alpha0':
            self.mse = tf.losses.MeanSquaredError()

    def __call__(self, inputs):
        loss = 0
        with tf.GradientTape() as tape:
            tape.watch(inputs)
            new_guess = np.random.rand(*inputs.shape)
            loss += self.mse(inputs, new_guess) 
        grad = tape.gradient(loss, inputs)
        return loss, grad

def main_eager():
    guess = np.random.rand(1,2,3).astype(np.float32)
    test_eager = TestEager()
    res = tfp.optimizer.lbfgs_minimize(
      test_eager, 
      initial_position=guess,
      tolerance=1e-8)
    print(res)

if __name__ == "__main__":
    version = tf.__version__
    if version == '2.0.0-alpha0':
        main_eager()
    else:
        tf.enable_eager_execution()
        main_eager()

Other info / logs Traceback :

2019-05-08 14:58:50.115047: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-08 14:58:50.147511: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz
2019-05-08 14:58:50.148229: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55a747566a30 executing computations on platform Host. Devices:
2019-05-08 14:58:50.148279: I tensorflow/compiler/xla/service/service.cc:169]   StreamExecutor device (0): <undefined>, <undefined>
Traceback (most recent call last):
  File "reprodcuing_bug.py", line 176, in <module>
    main_eager()
  File "reprodcuing_bug.py", line 159, in main_eager
    tolerance=1e-8)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 260, in minimize
    parallel_iterations=parallel_iterations)[0]
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3216, in while_loop_v2
    return_same_structure=True)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3442, in while_loop
    loop_vars = body(*loop_vars)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 238, in _body
    tolerance, f_relative_tolerance, x_tolerance, stopping_condition)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/bfgs_utils.py", line 153, in line_search_step
    converged=inactive)  # No search needed for these.
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/hager_zhang.py", line 283, in hager_zhang
    right=hzl.val_where(init_converged, val_0, val_c))
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 45, in val_where
    return cls(*(val_where(cond, t, f) for t, f in zip(tval, fval)))
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 45, in <genexpr>
    return cls(*(val_where(cond, t, f) for t, f in zip(tval, fval)))
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 42, in val_where
    return tf.where(cond, tval, fval)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 3231, in where
    return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
  File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 9060, in select
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape.  Input 0: [1,2] != input 1: [] [Op:Select]
brianwa84 commented 5 years ago

This is probably an issue with the new broadcasting tf.where, which has temporarily been rolled back in tf.

Brian Patton | Software Engineer | bjp@google.com

From: Louis KLEIN notifications@github.com Date: Wed, May 8, 2019 at 9:13 AM To: tensorflow/probability Cc: Subscribed

System information

  • OS Platform and Distribution: Linux NixOS unstable
  • TensorFlow installed from : binary using anaconda
  • TensorFlow version : '2.0.0-alpha0'
  • TensorFlow Probability version : '0.7.0-dev20190504'
  • Python version: 3.6.8

Describe the current behavior

When trying to use the tfp.optimizer.lbfgs_minimize function, I get an error, :

InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape. Input 0: [1,2] != input 1: [] [Op:Select]

Describe the expected behavior

This should run without issue, as it works under TensorFlow 1.13.1 and TensorFlow Probability 0.6.0, with tf.enable_eager_execution()

Code to reproduce the issue The following code runs under TF 1.13.1 with TFP 0.6.0, but not with TF 2.0 with TFP 0.7.0

import numpy as np import tensorflow as tf import tensorflow_probability as tfp

class TestEager(): def init(self):

tf.losses.mean_squarred_error is not the same under TF 2.0

    self.mse = tf.losses.mean_squared_error
    if tf.__version__ == '2.0.0-alpha0':
        self.mse = tf.losses.MeanSquaredError()

def __call__(self, inputs):
    loss = 0
    with tf.GradientTape() as tape:
        tape.watch(inputs)
        new_guess = np.random.rand(*inputs.shape)
        loss += self.mse(inputs, new_guess)
    grad = tape.gradient(loss, inputs)
    return loss, grad

def main_eager(): guess = np.random.rand(1,2,3).astype(np.float32) test_eager = TestEager() res = tfp.optimizer.lbfgs_minimize( test_eager, initial_position=guess, tolerance=1e-8) print(res)

if name == "main": version = tf.version if version == '2.0.0-alpha0': main_eager() else: tf.enable_eager_execution() main_eager()

Other info / logs Traceback :

2019-05-08 14:58:50.115047: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-05-08 14:58:50.147511: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1800000000 Hz 2019-05-08 14:58:50.148229: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x55a747566a30 executing computations on platform Host. Devices: 2019-05-08 14:58:50.148279: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): , Traceback (most recent call last): File "reprodcuing_bug.py", line 176, in main_eager() File "reprodcuing_bug.py", line 159, in main_eager tolerance=1e-8) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 260, in minimize parallel_iterations=parallel_iterations)[0] File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3216, in while_loop_v2 return_same_structure=True) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3442, in while_loop loop_vars = body(loop_vars) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 238, in _body tolerance, f_relative_tolerance, x_tolerance, stopping_condition) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/bfgs_utils.py", line 153, in line_search_step converged=inactive) # No search needed for these. File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/hager_zhang.py", line 283, in hager_zhang right=hzl.val_where(init_converged, val_0, val_c)) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 45, in val_where return cls((val_where(cond, t, f) for t, f in zip(tval, fval))) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 45, in return cls((val_where(cond, t, f) for t, f in zip(tval, fval))) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 42, in val_where return tf.where(cond, tval, fval) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper return target(args, **kwargs) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 3231, in where return gen_math_ops.select(condition=condition, x=x, y=y, name=name) File "/home/beren/.conda/envs/style_transfer/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 9060, in select _six.raise_from(_core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape. Input 0: [1,2] != input 1: [] [Op:Select]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/398, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJFSI566RMN4I7564FRJM3PULGWFANCNFSM4HLRSPDA .

pierremtb commented 5 years ago

Any updates on this?

pierremtb commented 5 years ago

There must have been some fixes pushed because I'm now able to use lbfgs_minimize with TF2.0 (tfp-nightly and tensorflow=2.0.0-beta1).

kyleabeauchamp commented 5 years ago

The might be a regression here, I tried running the same test case as above and encountered the following:

InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape.  Input 0: [1,2] != input 1: [1,2,3] [Op:Select]

In [3]: tf.__version__                                                                                                                                                                                                                                                        
Out[3]: '2.0.0-rc0'

In [4]: tfp.__version__                                                                                                                                                                                                                                                       
Out[4]: '0.9.0-dev20190905'

In [6]: sys.version                                                                                                                                                                                                                                                           
Out[6]: '3.7.3 | packaged by conda-forge 
brianwa84 commented 5 years ago

Do you see the same issue with tensorflow-probability==0.8.0rc0?

kyleabeauchamp commented 5 years ago

Same result.

In [2]: main_eager()
InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape.  Input 0: [1,2] != input 1: [1,2,3] [Op:Select]

In [3]: tfp.__version__                                                                                                                                                                                                                                                       
Out[3]: '0.8.0-rc0'
kyleabeauchamp commented 5 years ago

FYI, I recently updated my env to 0.9.0-dev20190915 and I'm still seeing the issue.

kyleabeauchamp commented 5 years ago

I also checked that the same error occurs on bfgs_minimize().

kyleabeauchamp commented 5 years ago

I think this might be working now, in that I was able to run the following example:

import numpy as np
import functools
import tensorflow.compat.v2 as tf
import tensorflow_probability as tfp

def _make_val_and_grad_fn(value_fn):
    @functools.wraps(value_fn)
    def val_and_grad(x):
        return tfp.math.value_and_gradient(value_fn, x)
    return val_and_grad

@_make_val_and_grad_fn
def quadratic(x):
    scales = np.array([1.0, 3.0])
    minimum = np.array([0.1, 0.3])
    return tf.reduce_sum(input_tensor=scales * (x - minimum)**2)

def rosenbrock(coord):
    x, y = coord[0], coord[1]
    fv = (1 - x)**2 + 100 * (y - x**2)**2
    dfx = 2 * (x - 1) + 400 * x * (x**2 - y)
    dfy = 200 * (y - x**2)
    return fv, tf.stack([dfx, dfy])

start = tf.constant([-1.2, 1.7])
out_rosenbrock = tfp.optimizer.lbfgs_minimize(rosenbrock, initial_position=start, tolerance=1e-5)

start = tf.constant([-1.2, 1.7])
out_quadratic = tfp.optimizer.lbfgs_minimize(quadratic, initial_position=start, tolerance=1e-5)
jonas-eschle commented 4 years ago

@kyleabeauchamp so what is the status of this? What changed that it works now? I see a similar error with TFP 0.8

srvasude commented 4 years ago

I'm going to close this since I can't reproduce this. I believe that part of this issue had to do with tf.where. tf.where (V2) has broadcasting support, but doesn't allow for the conditional to be a prefix to the batch shape of the branches (which was a thing in V1). I believe an up to date TF and TFP should not have this issue any more (as we import the correct version of tf.where in the BFGS code with an updated TF and TFP).

mikeevmm commented 3 years ago

@srvasude Sorry to necro, but I'm getting exactly the behaviour of #39970, with everything up to date. Have you tried @kyleabeauchamp's example with batching? Because I'm only seeing the error when attempting to batch minimize.

Here's a mwe:

# pip freeze
absl-py==0.11.0
appdirs==1.4.3
astunparse==1.6.3
CacheControl==0.12.6
cachetools==4.2.1
certifi==2019.11.28
chardet==3.0.4
cloudpickle==1.6.0
colorama==0.4.3
contextlib2==0.6.0
decorator==4.4.2
distlib==0.3.0
distro==1.4.0
dm-tree==0.1.5
flatbuffers==1.12
gast==0.3.3
google-auth==1.25.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
html5lib==1.0.1
idna==2.8
ipaddr==2.2.0
Keras-Preprocessing==1.1.2
lockfile==0.12.2
Markdown==3.3.3
msgpack==0.6.2
numpy==1.19.5
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.3
pep517==0.8.2
progress==1.5
protobuf==3.14.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.6
pytoml==0.1.21
requests==2.22.0
requests-oauthlib==1.3.0
retrying==1.3.3
rsa==4.7
six==1.15.0
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.4.1
tensorflow-estimator==2.4.0
tensorflow-probability==0.12.1
termcolor==1.1.0
typing-extensions==3.7.4.3
urllib3==1.25.8
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
# mwe.py
import tensorflow as tf
import tensorflow_probability as tfp

def function_and_gradient(x):
    print("CALLED")
    return x**2, 2*x

start = tf.constant([[-1.2], [1.7]])
opt_result = tfp.optimizer.lbfgs_minimize(function_and_gradient, initial_position=start, tolerance=1e-5)
print(opt_result)

Output:

# ... Tensorflow initializes ....
CALLED
CALLED
Traceback (most recent call last):
  File "mwe.py", line 9, in <module>
    opt_result = tfp.optimizer.lbfgs_minimize(function_and_gradient, initial_position=start, tolerance=1e-5)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 284, in minimize
    return tf.while_loop(
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 605, in new_func
    return func(*args, **kwargs)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2489, in while_loop_v2
    return while_loop(
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2735, in while_loop
    loop_vars = body(*loop_vars)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/lbfgs.py", line 257, in _body
    next_state = bfgs_utils.line_search_step(
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/bfgs_utils.py", line 210, in line_search_step
    ls_result = linesearch.hager_zhang(
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 538, in new_func
    return func(*args, **kwargs)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/linesearch/hager_zhang.py", line 277, in hager_zhang
    right=hzl.val_where(init_converged, val_0, val_initial))
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 46, in val_where
    return cls(*(val_where(cond, t, f) for t, f in zip(tval, fval)))
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 46, in <genexpr>
    return cls(*(val_where(cond, t, f) for t, f in zip(tval, fval)))
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow_probability/python/optimizer/linesearch/internal/hager_zhang_lib.py", line 43, in val_where
    return tf1.where(cond, tval, fval)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 4483, in where
    return gen_math_ops.select(condition=condition, x=x, y=y, name=name)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8676, in select
    _ops.raise_from_not_ok_status(e, name)
  File "/tmp/mwe/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6862, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Inputs to operation Select of type Select must have the same size and shape.  Input 0: [2,2] != input 1: [2] [Op:Select]

EDIT: Doing something closer to the docs' example and initializing start with numpy routines gets me correct behaviour, so maybe this is misuse; but what's wrong with that mwe?

brianwa84 commented 3 years ago

Confirmed, I have a repro

  def testIssue398(self):
    mse = tf.keras.losses.MeanSquaredError()

    def f(inputs):
      loss = 0
      with tf.GradientTape() as tape:
        tape.watch(inputs)
        new_guess = np.random.rand(*inputs.shape)
        loss += mse(inputs, new_guess)
      grad = tape.gradient(loss, inputs)
      return loss, grad

    self.evaluate(tfp.optimizer.lbfgs_minimize(
        f,
        initial_position=np.random.rand(1, 2, 3).astype(np.float32),
        tolerance=1e-8))
mikeevmm commented 3 years ago

This seems to relate somehow to calling tf.reduce_sum; the following works:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np

if __name__ == "__main__":
    def quadratic_function(x):
        return tfp.math.value_and_gradient(
                lambda x: tf.reduce_sum(x**2, axis=1), x)

    start = np.array([[1.0], [-2.0]])
    optim_results = tfp.optimizer.lbfgs_minimize(
          quadratic_function,
          initial_position=start,
          num_correction_pairs=10,
          tolerance=1e-8)

but

def quadratic_function(x):
    return tfp.math.value_and_gradient(
            lambda x: x**2, x)

does not.

(Calling tf.reduce_sum(x) flattens the array from e.g. [[1.0],[2.0]] to [1.0, 2.0])

mikeevmm commented 3 years ago

Update: Can confirm that it's misuse; the docs specify the output should have shape [...], and not [..., 1]. It would be nice if this were caught more gracefully.

legel commented 3 years ago

Quick note, in case this helps someone else. I had this issue, but the problem for me turned out to be that I thought I had to provide 1:1 correspondence pairs between the returned (loss, gradients). I was trying to return e.g. 10 loss values matching the 10 gradients, even though my loss was only really 1 value broadcasted to the same shape. Actually, once I returned just a shape (1) loss tensor with a (e.g.) shape (10) gradient tensor (for 10 variables being fit), then things worked great.