pyro-ppl / pyro

Deep universal probabilistic programming with Python and PyTorch
http://pyro.ai
Apache License 2.0
8.49k stars 982 forks source link

[FR] MCMC.marginal() doesn't return joint marginals. #1727

Closed adam-coogan closed 5 years ago

adam-coogan commented 5 years ago

I am confused about how to convert an MCMC run into an empirical distribution for multiple parameters. I would've thought MCMC.marginal() is the way to get this distribution, but the MCMCMarginals it returns does not allow sampling from joint distributions. Consider the following model and MCMC run:

pyro.set_rng_seed(18)
def model(obs=None):
    x = pyro.sample("x", dist.Normal(0., 5.))
    y = pyro.sample("y", dist.Normal(0., 1.))
    pyro.sample("obs", dist.Normal(x-y, 0.1), obs=obs)

obs = torch.tensor(0.)
kernel = NUTS(model)
mcmc_run = MCMC(kernel, num_samples=100, warmup_steps=20).run(obs)

I want to sample from the MCMC posterior for x and y. Running mcmc_run.marginal(["x", "y"]) gives me an MCMCMarginals object for x and y, but I can only sample from the empirical distributions for x and y separately:

posterior_x = mcmc_run.marginal(["x", "y"]).empirical["x"]
posterior_y = mcmc_run.marginal(["x", "y"]).empirical["y"]

This is confusing to me since the MCMCMarginals/Marginals object can take multiple sites as an argument.

I would instead expect Marginals to wrap a call to EmpiricalMarginal(trace_posterior=mcmc_run, sites=["x", "y"]). Or am I misunderstanding the point of MCMCMarginals/Marginals?

neerajprad commented 5 years ago

MCMCMarginals returns a dictionary with individual marginals for the sample sites. The reason for this restriction is that we are yet to establish what the behavior should be when the sample sites have different shapes.

I would instead expect Marginals to wrap a call to EmpiricalMarginal(trace_posterior=mcmc_run, sites=["x", "y"]).

Currently, the Empirical class takes in a tensor of samples, and this tensor can contain concatenated samples from multiple sites. When you have samples with the same shape, you can call EmpiricalMarginal(trace_posterior=mcmc_run, sites=["x", "y"]) directly and generate samples from the joint. The problem comes when they have different shapes, and that's the reason why we cannot delegate to EmpiricalMarginal in the general case.

That said, this is definitely far from ideal behavior, and we are yet to finalize on a good design for these classes. One design choice could be as follows:

If you have some ideas on improving the current design, we would love to hear your thoughts! In the meantime, we can probably do a better job of documenting these restrictions.

neerajprad commented 5 years ago

cc. @fehiepsi, @eb8680 in case they have any thoughts on how best to support this.

eb8680 commented 5 years ago

For now, it's easy enough to aggregate the values at the sites of interest manually from the joint posterior samples stored in MCMC(...).exec_traces.

In the long run, maybe rather than collecting the values into a new distribution, what we'd want is to have collapse (#1722 ) support MCMC so that we could collapse every random variable in a model except the sites of interest and grab samples by traceing the collapsed model.

neerajprad commented 5 years ago

I am closing this issue. The MCMCMarginals class is deprecated and the MCMC interface would return a dictionary of samples from the latent sites directly. These can then be combined by the user, stored in Empirical and then sampled from, or simply iterated over to yield samples from the joint distribution.