tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.26k stars 1.1k forks source link

lbfgs_minimize initial_position ValueError with Sigmoid bijector #1182

Closed joelberkeley closed 3 years ago

joelberkeley commented 3 years ago

I'm seeing

E     ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.

when using lbfgs_minimize and the Sigmoid bijector

import tensorflow_probability as tfp

bijector = tfp.bijectors.Sigmoid(
    low=tf.constant([-2.2, -1.0]),
    high=tf.constant([1.3, 3.3])
)

def _quadratic(x):
    return tf.reduce_sum(bijector.forward(x) ** 2, axis=-1, keepdims=True)[..., 0]

inverted = bijector.inverse(tf.constant([[-0.06450844, -0.02390611]]))

tfp.optimizer.lbfgs_minimize(
    lambda x: tfp.math.value_and_gradient(_quadratic, x),
    inverted
    # tf.Variable(inverted)
    # tf.constant(inverted.numpy())
    # tf.constant([[0.41927668, -1.312932]])  # the value of inverted, hardcoded
)

If I use inverted directly, it errors, but if I use any of the commented out versions, it works.

I'm using tfp version 0.11.1 and tf version 2.3.1

jeffpollock9 commented 3 years ago

I think there is a missing comma in:

inverted = bijector.inverse(tf.constant([[-0.06450844 -0.02390611]]))

but perhaps also another issue somewhere.

joelberkeley commented 3 years ago

@jeffpollock9 yes, thanks. updated. error still happens

ColCarroll commented 3 years ago

Good catch on the comma! Everything still goes through (it is subtraction, but gets broadcasted to the "right" shape), but quietly gives the wrong answer.

Luckily, the same bug shows up, which is that tfp.math.value_and_gradient returns a None gradient, and it is because it tries to trace further up than it can see. Adding a tf.stop_gradient fixes it, I think (it depends on what you want to minimize -- I assume you have a reason for inverting, and then pushing back forward a point!):

opt = tfp.optimizer.lbfgs_minimize(
    lambda x: tfp.math.value_and_gradient(_quadratic, x),
    tf.stop_gradient(inverted))
joelberkeley commented 3 years ago

I assume you have a reason for inverting, and then pushing back forward a point!

indeed. We want to optimize a function over a constrained space, so we're training an unconstrained parameter and using a bijector to keep it in the constrained region

ColCarroll commented 3 years ago

@joelberkeley -- I think stop_gradient will fix the issue after the typo is taken care of, is that true?

joelberkeley commented 3 years ago

@ColCarroll it certainly fixes it for my actual use case, Would you say this is a workaround, or a canonical solution?

joelberkeley commented 3 years ago

if it is the correct solution, can it be in the docs please? the docs say

 real Tensor of shape [..., n]. The starting point, or points when using batching dimensions, of the search procedure. At these points the function value and the gradient norm should be finite.

I don't get the idea that I need to use tf.stop_gradients from that

ColCarroll commented 3 years ago

This is only a workaround to a known bug -- the root cause is bijectors cacheing outputs, and there has been some work done to address this.

I believe you could also add 0 (inverted + 0), do inverted.numpy(), or wrap in tf.identity, and those should all work.

Apologies that you ran into it -- hopefully including this discussion here might help users until this is fixed!

joelberkeley commented 3 years ago

ok. could you show me the ticket which tracks the bug, so I can follow it?

ColCarroll commented 3 years ago

Oy, apologies -- #1190 will track it.