Feature request: Use monte_carlo_csiszar_f_divergence for more than one variable

tillahoffmann commented 6 years ago

Using monte_carlo_csiszar_f_divergence to perform inference for a single distribution works as expected (see example below).

However, I couldn't figure how to make use of monte_carlo_csiszar_f_divergence for more complex models that involves different distributions (e.g. one normal, one gamma). Any advice would be much appreciated. #147 may be related?

import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability import distributions as tfd
import functools as ft
import numpy as np

np.random.seed(1)
samples = np.random.normal(-2, 1, 5)  # Generate some "data"

def evaluate_log_joint(samples, mean, *extra_variables):
    # QUESTION: How can I make use of extra_variables when evaluating the log joint?
    rv_mean = tfd.Normal(0, 1)  # Prior for the mean
    rv_samples = tfd.Normal(mean, 1)  # Likelihood
    return tf.reduce_mean(  # Take the mean over monte carlo samples
        tf.reduce_sum(rv_samples.log_prob(samples), axis=-1) + 
        rv_mean.log_prob(mean)
    )

with tf.Graph().as_default() as graph:
    surrogate_posterior = tfd.Normal(
        tf.Variable([0.]),
        tf.Variable([1.])
    )

    elbo_loss = tfp.vi.monte_carlo_csiszar_f_divergence(
        f=tfp.vi.kl_reverse,
        p_log_prob=ft.partial(evaluate_log_joint, samples),
        q=surrogate_posterior,  # QUESTION: How can I pass in multiple nodes in a standard factor graph?
        num_draws=1000,
        name='elbo_loss'
    )

    train = tf.train.AdamOptimizer(.1).minimize(elbo_loss)
    init_op = tf.global_variables_initializer()

sess = tf.Session(graph=graph)
sess.run(init_op)

losses = []

for i in range(200):
    _, loss = sess.run([train, elbo_loss])
    losses.append(loss)

# Calculate the posterior parameters exactly
neff = samples.size + 1
loc = np.sum(samples) / neff
scale = 1 / np.sqrt(neff)

print(loc, sess.run(surrogate_posterior.loc))  # -1.6205239658469637 [-1.6224545]
print(scale, sess.run(surrogate_posterior.scale))  # 0.4082482904638631 [0.40540358]

axch commented 6 years ago

Good one. Doesn't look like there's any way to use the software as it stands to do that. The best I have for you right now is that monte_carlo_csiszar_f_divergence is not a very complex function, so you could write your own variant.

For the future, I will edit this issue into a feature request to add the capability. If you feel like submitting a PR, we would welcome it!

tillahoffmann commented 6 years ago

@axch, thanks for the update. Do you have recommendations on how to perform variational inference using tensorflow probability for multivariate models?

axch commented 5 years ago

I'm not actually very familiar with that suite of capabilities. Perhaps @jvdillon could comment?

csuter commented 5 years ago

Caveat: AFAIK this module was a very early addition to TFP and is probably in need of some dusting off and additional love.

I believe the commonly imagined use case was probably something like structured mean-field VI, where q would be a (block) diagonal MVN.

The main thing this method is good for is specification of one term in the elbo, in which we have

a nice way of computing the p log prob (which can be as complicated a graph as you might like), and
a specified q distribution, ideally one which is "reparameterizable" so that lower variance stochastic gradient estimates are possible. Otherwise, we can estimate gradients with the 'score function trick' but this has poor performance, especially as number of dimensions increases.

If you want to have richer structure in the variational distribution, q, you probably need do some scribbling of maths and then invoke this function multiple times, once for each E_q[log p(x | z)] term in your elbo.

Finally, note that monte_carlo_csiszar_f_divergence does not compute the KL[q(z) || p(z)] term in the elbo, so you need to throw that in by hand. We have analytic KL's between many distributions out of the box; otherwise you can use monte carlo to sample a z from q(z) and compute log q - log p.

Hope this helps! Happy to try to help more, if you can give more details about your problem setup.

csuter commented 5 years ago

I'm gonna remove the 'good first issue' tag, because I'm fairly sure solving this completely will require some thoughtful and non-trivial API design discussion. Namely, it's a question of defining and aligning the true and variational model, when the structure and alignment are not as simple as, say, mean-field Gaussian.

davmre commented 5 years ago

This was addressed in 499827efa11b55f44fa0d5ef0432f3e1eebeff01 (and several changes leading up to that one). It's now possible to pass JointDistribution objects to tfp.vi.monte_carlo_variational_loss, which is a renamed and updated version of monte_carlo_csiszar_f_divergence; the updated code will be included in the upcoming TFP 0.8 release.

Closing this issue; feel free to reopen if needed.

tensorflow / probability

Feature request: Use monte_carlo_csiszar_f_divergence for more than one variable #223