Semantics of Edward2/Distributions not clear in Eager Mode

ppham27 commented 6 years ago

In graph mode, if I re-evaluate a random variable, I get a new sample. If I update a variable parameter, I get a new sample with that parameter:

import tensorflow as tf
import tensorflow_probability as tfp

graph = tf.Graph()
with graph.as_default():
    loc = tf.get_variable(        
        'loc', (), initializer=tf.constant_initializer(5.), use_resource=True)
    update_loc_op = tf.assign(loc, -5.)    
    norm = tfp.edward2.Normal(loc=loc, scale=0.01).value
    init_op = tf.group(tf.global_variables_initializer())
graph.finalize()

with graph.as_default(), tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(norm))
    print(sess.run(norm))    
    print(sess.run((update_loc_op, norm)))

produces the output

5.0021605  # first sample
4.9917226  # new sample
(-5.0, -4.9986978)  # sample with new loc parameter, works as expected.

In eager mode, no new sampling happens (which I sort of understand since I guess the whole program is just one sess.run command now). Updating parameters, doesn't lead to a new samples with that parameter either:

import tensorflow as tf
import tensorflow_probability as tfp
import tensorflow.contrib.eager as tfe

tf.enable_eager_execution()

loc = tfe.Variable(initial_value=5.)
norm = tfp.edward2.Normal(loc=loc, scale=0.01)

print(norm.numpy())
print(norm.numpy())

loc.assign(-5.)

print(loc.numpy())
print(norm.numpy())
print(norm.distribution.sample())

produces the output

5.022224  # first sample
5.022224  # no new sampling, so RandomVariables are just a single sample?
-5.0  # Updated loc parameter
5.022224  # Random variable with loc=-5.0 hasn't updated.
tf.Tensor(5.004547, shape=(), dtype=float32)  # Taking a new sample still uses the old loc=5.0

brianwa84 commented 6 years ago

You're right to think of an e2 rv as basically just a sample, though it also carries it's distribution and importantly its stochastic predecessors.

And in eager mode, TF is much like an accelerator friendly, multi threaded version of numpy.

So you can write a function to do your sampling and then use ed2 to intercept sampling actions to get a prior sample, a conditional one, a posterior one, a differentiable (some restrictions apply) likelihood, etc. Or you can use tfp.distributions directly.

On Sat, Nov 3, 2018, 3:38 PM Philip Pham <notifications@github.com wrote:

In graph mode, if I re-evaluate a random variable, I get a new sample. If I update a variable parameter, I get a new sample with that parameter:

import tensorflow as tfimport tensorflow_probability as tfp

graph = tf.Graph()with graph.as_default(): loc = tf.get_variable( 'loc', (), initializer=tf.constant_initializer(5.), use_resource=True) update_loc_op = tf.assign(loc, -5.) norm = tfp.edward2.Normal(loc=loc, scale=0.01).value init_op = tf.group(tf.global_variables_initializer()) graph.finalize() with graph.as_default(), tf.Session() as sess: sess.run(init_op) print(sess.run(norm)) print(sess.run(norm)) print(sess.run((update_loc_op, norm)))

produces the output

5.0021605 # first sample 4.9917226 # new sample (-5.0, -4.9986978) # sample with new loc parameter, works as expected.

In eager mode, no new sampling happens (which I sort of understand since I guess the whole program is just one sess.run command now). Updating parameters, doesn't lead to a new samples with that parameter either:

import tensorflow as tfimport tensorflow_probability as tfpimport tensorflow.contrib.eager as tfe

tf.enable_eager_execution()

loc = tfe.Variable(initial_value=5.) norm = tfp.edward2.Normal(loc=loc, scale=0.01) print(norm.numpy())print(norm.numpy())

loc.assign(-5.) print(loc.numpy())print(norm.numpy())print(norm.distribution.sample())

produces the output

5.022224 # first sample 5.022224 # no new sampling, so RandomVariables are just a single sample? -5.0 # Updated loc parameter 5.022224 # Random variable with loc=-5.0 hasn't updated. tf.Tensor(5.004547, shape=(), dtype=float32) # Taking a new sample still uses the old loc=5.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/212, or mute the thread https://github.com/notifications/unsubscribe-auth/AVJZIwngLRucudban3_AYvhYrrL-bo5Gks5urfCXgaJpZM4YM9t4 .

brianwa84 commented 6 years ago

Ping if you need further guidance; also take a look at the tutorials/examples: https://www.tensorflow.org/probability/overview

swertz commented 6 years ago

I have a question that seems related to this, asked here: https://stackoverflow.com/questions/53338975/use-and-modify-variables-in-tensorflow-bijectors

It's not clear why modifying the variables used to construct distributions/bijectors has an effect in graph mode but not in eager mode...

swertz commented 5 years ago

@brianwa84 any update on this?

brianwa84 commented 5 years ago

Just to be clear, I posted a reply on the SO question (not here).

swertz commented 5 years ago

Yes thanks for answering there. Moving the discussion here, I do think (but there might be good reasons why I'm wrong here) it would be a more intuitive behaviour in eager mode if the shift that was used by the transformed distribution object (as in the SO question) when sampling or computing log-likelihoods was a ref to the actual variable, as in graph mode, and not the evaluated numpy value (especially if eager mode is going to become more and more central in TF)...

brianwa84 commented 5 years ago

TF is standardizing on Keras [1] (perhaps also tf.function) as the way of expressing cascades of computation like this in TF2.0. To that end we've been playing with a kind of Keras layer that emits distribution objects with user control over the tensor concretization method (default is sampling). [2] This is currently more focused on previous layer activations conditioning a downstream distribution. But I think you could now create a layer which retained a variable reference and whose new function created a TransformedDistribution using the variable instead of using preceding-layers activations. I'm not sure what exactly is the Keras-preferred way of allocating variables to ensure they are trained, you'd need to look at some existing Keras layers like Dense. [1] https://medium.com/tensorflow/standardizing-on-keras-guidance-on-high-level-apis-in-tensorflow-2-0-bad2b04c819a [2] https://github.com/tensorflow/probability/blob/master/tensorflow_probability/python/layers/distribution_layer.py#L234

brianwa84 commented 5 years ago

To answer my own question, here is the way you add a variable in Keras: self.add_weight https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/python/keras/layers/core.py#L937

tensorflow / probability

Semantics of Edward2/Distributions not clear in Eager Mode #212