tensorflow / probability

Probabilistic reasoning and statistical analysis in TensorFlow
https://www.tensorflow.org/probability/
Apache License 2.0
4.23k stars 1.09k forks source link

Bernoulli stuck at 1e-6 #423

Open rzu512 opened 5 years ago

rzu512 commented 5 years ago

I try to use Hamiltonian Monte Carlo to fit a model consists of a single Bernoulli distribution. The model can get stuck at p=1e-6. Replacing Bernoulli by RelaxedBernoulli doesn't solve the problem.

If I use tensorflow-probability to fit a more complex model that contains Bernoulli distributions, would the parameters all get stuck at 0.0 and 1.0?

Result:

[[1.e-06]
 [1.e-06]
 [1.e-06]
 ...
 [1.e-06]
 [1.e-06]
 [1.e-06]]
mean:0.0000  stddev:0.0000  acceptance:0.0000

Expected result: I expect the parameter p should be 0.3 instead of 1e-6

Test:

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
from functools import partial

def logp_func(observations, p):
    b = tfd.RelaxedBernoulli(np.float32(0.5), probs=p)
    return tf.reduce_sum(b.log_prob(observations))

def main():
    obs = np.random.choice(2, p=[0.7, 0.3], size=1000).astype(np.float32)
    observations = tf.constant(np.array(obs))
    logp = partial(logp_func, observations)

    step_size = tf.get_variable(
        name='step_size',
        initializer=1.,
        use_resource=True,  # For TFE compatibility.
        trainable=False)

    num_results = int(1e4)
    num_burnin_steps = int(1e3)
    hmc = tfp.mcmc.HamiltonianMonteCarlo(
        target_log_prob_fn=logp,
        num_leapfrog_steps=3,
        step_size=step_size,
        step_size_update_fn=tfp.mcmc.make_simple_step_size_update_policy(
            num_adaptation_steps=int(num_burnin_steps * 0.8)))

    samples, kernel_results = tfp.mcmc.sample_chain(
        num_results=num_results,
        num_burnin_steps=num_burnin_steps,
        current_state=np.array([1e-6], dtype=np.float32),
        kernel=hmc)

    # Initialize all constructed variables.
    init_op = tf.global_variables_initializer()

    with tf.Session() as sess:
        init_op.run()
        samples_, kernel_results_ = sess.run([samples, kernel_results])
        print(samples_)
        print('mean:{:.4f}  stddev:{:.4f}  acceptance:{:.4f}'.format(
            samples_.mean(), samples_.std(),
            kernel_results_.is_accepted.mean()))

main()
SiegeLordEx commented 5 years ago

A simple thing to try is to use the logits argument to tfd.Bernoulli rather than probs. probs parameterization has two issues:

  1. It's constrained between 0 and 1, while HMC prefers unconstrained parameters.
  2. The gradients are infinite when the probabilities are 0 and 1.

After performing HMC using logits, you can still convert them to probabilities by passing them through the tf.nn.sigmoid function.

rzu512 commented 5 years ago

Using logits would still stuck near 0.

SiegeLordEx commented 5 years ago

I've verified that it works if you use tfd.Bernoulli. tfd.RelaxedBernoulli is actually only defined on an interval (0, 1), which in this case produces NaNs when you pass it observations of 0 and 1 (which is fine for a regular Bernoulli).

I'd stay away from RelaxedBernoulli unless you really want to be able to propagate gradients through its samples.

rzu512 commented 5 years ago

Test:

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
from functools import partial

def logp_func(observations, p):
    b = tfd.Bernoulli(logits=p)
    return tf.reduce_sum(b.log_prob(observations))

def main():
    obs = np.random.choice(2, p=[0.3, 0.7], size=1000).astype(np.float32)
    observations = tf.constant(np.array(obs))
    logp = partial(logp_func, observations)

    step_size = tf.get_variable(
        name='step_size',
        initializer=1.,
        use_resource=True,  # For TFE compatibility.
        trainable=False)

    num_results = int(1e3)
    num_burnin_steps = int(1e2)
    hmc = tfp.mcmc.HamiltonianMonteCarlo(
        target_log_prob_fn=logp,
        num_leapfrog_steps=3,
        step_size=step_size,
        step_size_update_fn=tfp.mcmc.make_simple_step_size_update_policy(
            num_adaptation_steps=int(num_burnin_steps * 0.8)))

    samples, kernel_results = tfp.mcmc.sample_chain(
        num_results=num_results,
        num_burnin_steps=num_burnin_steps,
        current_state=np.array([1e-6], dtype=np.float32),
        kernel=hmc)

    # Initialize all constructed variables.
    init_op = tf.global_variables_initializer()

    with tf.Session() as sess:
        init_op.run()
        samples_, kernel_results_ = sess.run([samples, kernel_results])
        print(samples_)
        print('mean:{:.4f}  stddev:{:.4f}  acceptance:{:.4f}'.format(
            samples_.mean(), samples_.std(),
            kernel_results_.is_accepted.mean()))

main()

Result:

...
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]
 [1.e-06]]
mean:0.0000  stddev:0.0000  acceptance:0.0000

1,000 steps are not enough to "unstuck" a Bernoulli.

Versions:

tensorflow-cpu: 1.13.1
tensorflow-probability: 0.6.0
skeydan commented 5 years ago

I think the initial step size is too high. When I change to 0.1 I get results that look like this:

 [...]
 [...]
 [0.8799779 ]
 [0.8799779 ]
 [0.9403964 ]
 [1.0033302 ]
 [1.0279231 ]
 [0.81895196]
 [0.7666844 ]
 [0.8280707 ]
 [0.86828065]]
mean:0.8181  stddev:0.0694  acceptance:0.7240
SiegeLordEx commented 5 years ago

In your latest test code you've reduced the number of steps 10 fold... if you change it back to:

num_results = int(1e4)
num_burnin_steps = int(1e3)

like you had in the original code, it'll work. Yes, the initial step size is too large, however in this case if you give it enough time to adapt, it'll still work. In your latest test code you don't give it enough time, so it remains too large, causing your acceptance probability to be low. In general, however, it is best to start with a step size too low than too high: for more complex problems, it won't be able to recover if you start out too high.

I recommend plotting the evolution of step size over time to verify such things (you can find it in kernel_results.extra.step_size_assign, you'll need to forgo using sample_chain's num_burnin_steps because that hides the adaptation period).

rzu512 commented 5 years ago

Part of my model is binary optimization.

I expect the true value of the Bernoulli's probs to be either 1 or 0. Then I would need a infinitesimally small step-size at 1 or 0. But I need a normal step-size at value between 1 and 0.

The Bernoulli probably will still get stuck in this case, unless the step-size can change during sampling.

I guess I can try to clip the logits to [-10, 10]. But the logits would get stuck near -10 and 10.