Unexpected behavior from tfd.MarkovChain sample method

tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow

https://www.tensorflow.org/probability/

Apache License 2.0

4.23k stars 1.09k forks source link

Unexpected behavior from tfd.MarkovChain sample method #1355

Open xiaolong1979 opened 3 years ago

xiaolong1979 commented 3 years ago

Can someone help to explain why the MarkovChain does not generate independent samples after using a bijector in the transition_fn? Thanks!!!

In the codes below, gaussian_walk1 and gaussian_walk2 are expected to be the same, since normal(x,1)=x+normal(0,1). While gaussian_walk1.sample(5) gives expected independent samples, gaussian_walk2.sample(5) gives identical samples.

gaussian_walk1 = tfd.MarkovChain(
  initial_state_prior=tfp.distributions.Deterministic(0.),
  transition_fn=lambda _, x: tfd.Normal(loc=x, scale=1.),
  num_steps=10)

gaussian_walk2 = tfd.MarkovChain(
  initial_state_prior=tfp.distributions.Deterministic(0.),
  transition_fn=lambda _, x: tfd.TransformedDistribution(distribution=tfd.Normal(loc=0.0, scale=1.),bijector=tfp.bijectors.Shift(x)),
  num_steps=10)

gaussian_walk1.sample(5)

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[ 0.        ,  0.4963878 ,  0.0830816 , -0.77141273, -0.91577226,
         0.23975712,  0.49968088, -2.0648232 , -2.0975184 , -3.7448356 ],
       [ 0.        , -2.2646387 , -0.8099165 , -1.6681502 , -1.4593805 ,
        -1.5122384 , -1.7060741 , -1.7010493 , -0.828577  , -1.4167368 ],
       [ 0.        ,  0.06805495, -0.32092297, -1.0535722 , -2.30161   ,
        -3.9674587 , -4.319279  , -4.5414166 , -3.836207  , -4.32444   ],
       [ 0.        ,  2.509243  ,  3.1126018 ,  3.8786044 ,  5.56534   ,
         6.333398  ,  6.178385  ,  5.152129  ,  4.0463457 ,  4.3648543 ],
       [ 0.        ,  0.3113255 ,  0.9251684 ,  0.81194293,  0.48614424,
         0.05987284, -1.2350528 , -0.2448535 ,  0.2054899 ,  0.6470542 ]],
      dtype=float32)>

gaussian_walk2.sample(5)

<tf.Tensor: shape=(5, 10), dtype=float32, numpy=
array([[ 0.        , -0.655154  , -0.19072625, -0.9975474 , -0.86116153,
        -1.215732  , -1.2559065 , -1.4174258 ,  0.42590857, -0.99431324],
       [ 0.        , -0.655154  , -0.19072625, -0.9975474 , -0.86116153,
        -1.215732  , -1.2559065 , -1.4174258 ,  0.42590857, -0.99431324],
       [ 0.        , -0.655154  , -0.19072625, -0.9975474 , -0.86116153,
        -1.215732  , -1.2559065 , -1.4174258 ,  0.42590857, -0.99431324],
       [ 0.        , -0.655154  , -0.19072625, -0.9975474 , -0.86116153,
        -1.215732  , -1.2559065 , -1.4174258 ,  0.42590857, -0.99431324],
       [ 0.        , -0.655154  , -0.19072625, -0.9975474 , -0.86116153,
        -1.215732  , -1.2559065 , -1.4174258 ,  0.42590857, -0.99431324]],
      dtype=float32)>

junpenglao commented 3 years ago

Not sure if this is intended behavior, but currently TransformedDistribution does not get the batch shape from bijectors:

mu = tf.range(5, dtype=tf.float32)
dist1 = tfd.Normal(mu, 1)
dist1
#==> <tfp.distributions.Normal 'Normal' batch_shape=[5] event_shape=[] dtype=float32>
dist2 = tfb.Shift(mu)(tfd.Normal(0., 1.))
dist2
#==> <tfp.distributions.TransformedDistribution 'shiftNormal' batch_shape=[] event_shape=[] dtype=float32>

As a result when you call dist2.sample() you get a scalar sample from the base distribution and broadcast add to mu

I think there is some work going on to have bijector also get full shape semantic sample/batch/event/ - maybe @davmre knows a bit more re road map

xiaolong1979 commented 3 years ago

Thanks @junpenglao . TransformedDistribution may be the issue. I have another example with similar issue on batch_shape for log_prob method. The example used a transition function with TransformedDistribution.

tfd.Normal has loc parameter for the transition. Most other distributions are not formulated this way and rely on bijector and TransformedDistribution. It would be great to have bijector get full shape semantic sample/batch/event/.

junpenglao commented 3 years ago

Alternatively you can make it work by using tfd.Sample:

gaussian_walk2 = tfd.MarkovChain(
    initial_state_prior=tfp.distributions.Deterministic(0.),
    transition_fn=lambda _, x: tfd.TransformedDistribution(
        distribution=tfd.Sample(tfd.Normal(loc=0.0, scale=1.), x.shape),
        bijector=tfp.bijectors.Shift(x)),
    num_steps=10)

xiaolong1979 commented 3 years ago

@junpenglao Thanks for the idea. The sample method works as expected, but the log_prob does not seem to work. Did i miss anything?

gaussian_walk2 = tfd.MarkovChain( initial_state_prior=tfp.distributions.Deterministic(0.), transitionfn=lambda , x: tfd.TransformedDistribution( distribution=tfd.Sample(tfd.Normal(loc=0.0, scale=1.), x.shape), bijector=tfp.bijectors.Shift(x)), num_steps=10)

print(gaussian_walk2.sample(3)) tf.Tensor( [[ 0. 0.6361662 0.7333202 -0.49361163 -0.2956681 1.3065095 1.2681572 0.5679376 -0.29533887 -0.05293749] [ 0. 0.85948235 -1.1862569 -0.85393983 -0.99621975 -2.624208 -0.59781265 -0.0199101 -1.1001246 -1.0604566 ] [ 0. 0.93100184 0.31189167 0.9530297 -0.5100331 -0.7756746 -2.2172916 -1.8662072 -3.9851084 -2.3052304 ]], shape=(3, 10), dtype=float32)

print(gaussian_walk2.log_prob(gaussian_walk2.sample(3)))

error:

InvalidArgumentError Traceback (most recent call last)

in () 7 8 print(gaussian_walk2.sample(3)) ----> 9 print(gaussian_walk2.log_prob(gaussian_walk2b.sample(3))) 8 frames /usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value) InvalidArgumentError: Invalid reduction dimension (0 for input with 0 dimension(s) [Op:Sum]

davmre commented 3 years ago

I think @junpenglao is correct about the issue with sampling: currently TransformedDistribution expects that the base distribution's batch shape is at least as large as the bijector's 'batch shape' (recently annotated as bijector.experimental_batch_shape). Otherwise, the base distribution will sample fewer degrees of freedom than needed for the final result.

Using tfd.Sample to add a dimension is almost correct, but it will incorrectly reduce over the log_prob because Sample adds event shape, not batch shape (ie, it defines a distribution over a vector of independent samples, rather than a batch of distributions over scalar samples). To construct a Normal distribution with batch shape, you can either pass a batch of parameters, or wrap with tfd.BatchBroadcast:

gaussian_walk2 = tfd.MarkovChain(
  initial_state_prior=tfp.distributions.Deterministic(0.),
  transition_fn=lambda _, x: tfd.TransformedDistribution(
    # Create a distribution with `distribution.batch_shape == x.shape`.
    # This could also be `tfd.BatchBroadcast(tfd.Normal(0., 1.), to_shape=x.shape)`.
    distribution=tfd.Normal(loc=0.0, scale=tf.ones_like(x)),
    bijector=tfp.bijectors.Shift(x)),
num_steps=10)

x = gaussian_walk2.sample(3)
print(x.shape)  # ==> [3, 10]

lp = gaussian_walk2.log_prob(x)
print(lp.shape)  # ==> [3]

Now that bijectors have experimental_batch_shape annotations, it should be possible for TransformedDistribution to do this sort of batch broadcasting automatically. This is on my TODO list, though I don't think we have a particular timeline (contributions always appreciated!).

xiaolong1979 commented 3 years ago

Hi Dave, This looks great! I will test them next week after i am back from vacation.

Thanks a lot!

Xiaolong

Sent from my iPhone

On Jun 14, 2021, at 7:43 AM, Dave Moore @.***> wrote:

I think @junpenglao is correct about the issue with sampling: currently TransformedDistribution expects that the base distribution's batch shape is at least as large as the bijector's 'batch shape' (recently annotated as bijector.experimental_batch_shape). Otherwise, the base distribution will sample fewer degrees of freedom than needed for the final result.

The tfd.Sample approach is almost correct, but it will incorrectly reduce over the log_prob because Sample adds event shape, not batch shape (ie, it defines a distribution over a vector of independent samples, rather than a batch of distributions over scalar samples). To construct a Normal distribution with batch shape, you can either pass a batch of parameters, or wrap with tfd.BatchBroadcast:

gaussian_walk2 = tfd.MarkovChain( initial_state_prior=tfp.distributions.Deterministic(0.), transitionfn=lambda , x: tfd.TransformedDistribution(

Create a distribution with distribution.batch_shape == x.shape.
# This could also be `tfd.BatchBroadcast(tfd.Normal(0., 1.), to_shape=x.shape)`.
distribution=tfd.Normal(loc=0.0, scale=tf.ones_like(x)),
bijector=tfp.bijectors.Shift(x)),
num_steps=10)

x = gaussian_walk2.sample(3) print(x.shape) # ==> [3, 10]

lp = gaussian_walk2.log_prob(x) print(lp.shape) # ==> [3] Now that bijectors have experimental_batch_shape annotations, it should be possible for TransformedDistribution to do this sort of batch broadcasting automatically. This is on my TODO list, though I don't think we have a particular timeline (contributions always appreciated!).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

xiaolong1979 commented 3 years ago

Hi, @davmre , there is a new error when i tried: ImportError: cannot import name 'version' from 'keras' (/usr/local/lib/python3.7/dist-packages/keras/init.py) . It was not shown up before. How do I fix it?

Here are the complete message:

ImportError Traceback (most recent call last)

in () ----> 1 gaussian_walk2 = tfd.MarkovChain( 2 initial_state_prior=tfp.distributions.Deterministic(0.), 3 transition_fn=lambda _, x: tfd.TransformedDistribution( 4 # Create a distribution with `distribution.batch_shape == x.shape`. 5 # This could also be `tfd.BatchBroadcast(tfd.Normal(0., 1.), to_shape=x.shape)`. 25 frames /usr/local/lib/python3.7/dist-packages/keras/api/_v2/keras/__init__.py in () 8 import sys as _sys 9 ---> 10 from keras import __version__ 11 from keras.api._v2.keras import __internal__ 12 from keras.api._v2.keras import activations ImportError: cannot import name '__version__' from 'keras' (/usr/local/lib/python3.7/dist-packages/keras/__init__.py) --------------------------------------------------------------------------- NOTE: If your import is failing due to a missing package, you can manually install dependencies using either !pip or !apt. To view examples of installing some common dependencies, click the "Open Examples" button below. I tried several stack overflow comments but did not find the right solution. Thanks for your help.

xiaolong1979 commented 3 years ago

Well, somehow it worked when I tried on amazon workspace and ran the following codes in the first cell. There is something going on with Colab or Keras import. Would like to hear any insight.

%matplotlib inline

from keras.models import Sequential from keras.layers import Dense, Activation, Dropout, Flatten, MaxPooling2D from keras.layers.convolutional import Conv2D from keras.layers.recurrent import SimpleRNN, LSTM, GRU from keras.utils import np_utils from keras import backend as K

from distutils.version import LooseVersion as LV from keras import version

from IPython.display import SVG from keras.utils.vis_utils import model_to_dot

from keras.datasets import mnist, fashion_mnist, imdb

from sklearn.model_selection import train_test_split

import numpy as np import matplotlib.pyplot as plt import seaborn as sns

print('Using Keras version:', version, 'backend:', K.backend()) assert(LV(version) >= LV("2.0.0"))

tensorflow / probability

Unexpected behavior from tfd.MarkovChain sample method #1355

Create a distribution with distribution.batch_shape == x.shape.

Create a distribution with `distribution.batch_shape == x.shape`.