pyro-ppl / pyro

Deep universal probabilistic programming with Python and PyTorch
http://pyro.ai
Apache License 2.0
8.44k stars 985 forks source link

Is AutoStructured implementing the guide for directed graphical models? #3000

Open jiaqingxie opened 2 years ago

jiaqingxie commented 2 years ago

I am a beginner for Pyro. I'm currently working on a project in improving autoguides by removing mean-field assumptions, which means that the posterior should not use AutoMultivariateNormal or AutoDiagonalNormal, with latent variables dependent on other latent variables possibly. So for example, I have a simple model, f(D1, D2, X) = f(X|D1)f(D2|D1)f(X|D2), where D1 and D2 are latent variables and X is the observed Variable. I have computed an inverse of the model. Assume that the prior and conditional probability distribution are all standard normal distribution or in the Gaussian Family, then how do I rewrite the get_posterior() function? Or should I pass the model to the AutoStructured Module instead? Thank you!

jiaqingxie commented 2 years ago

Also, I have no idea on how AutoStructured calculated the posterior, can anybody in the team just write an understandable description to me? Thank you!

eb8680 commented 2 years ago

Hi @JIAQING-XIE, Pyro currently contains two structured Gaussian autoguides, AutoStructured and AutoGaussian. You could subclass or fork either of these if they don't quite do what you want or you want to add new functionality.

Note that AutoMultivariateNormal and AutoLowRankMultivariateNormal are also not mean-field approximations because they use joint multivariate normal distributions over all latent variables in a model, but they do not exploit the dependency structure of the model and are thus not as statistically efficient as they could be.

AutoStructured uses the helper function pyro.infer.util.get_dependencies() to compute a posterior dependency structure (by applying moralization to the prior dependency structure). It then walks through each latent variable in the posterior in topological order and samples a value from a conditional distribution whose parameters are functions of the parent posterior nodes' sampled values. The conditional distributions are linear-Gaussian by default, but can also be customized, as can the posterior dependency structure.

Like other Pyro autoguides, AutoStructured constrains each guide sample to the model site's support, as described in section 2.3 of "Automatic Differentiation Variational Inference". The closest analogue in the literature to AutoStructured is the procedure described in "Faithful Inversion of Generative Models for Effective Amortized Inference", but AutoStructured uses a different ordering on the posterior variables.

AutoGaussian uses get_dependencies() to compute the prior dependency structure of the model and creates one unnormalized Gaussian factor with learnable parameters for each latent and observed sample site in the model, ultimately producing a Gaussian factor graph with the same dependency structure as the model. It then uses Gaussian tensor variable elimination to efficiently compute the normalizing constant of this factor graph and sample from the joint distribution over all latent variables. AutoGaussian also constrains guide samples to the model's support.

You can think of AutoGaussian as a much more parsimonious encoding of AutoMultivariateNormal that contains only the dependencies in the true posterior. The closest analogue in the literature to AutoGaussian is the variational approximation described in "Composing graphical models with neural networks for structured representations and fast inference".

jiaqingxie commented 2 years ago

Hello @eb8680. Thanks for your reply. I have new questions. So you said that AutoStructured used pyro.infer.util.get_dependencies(), according to the paper "Faithful Inversion of Generative Models for Effective Amortized Inference". So I thought that inverse (getting the posterior) means only one of all those possible inverses is implemented in AutoStructured? And my supervisor has given me two other methods: Learning Stochastic Inverses and Inference Networks for Sequential Monte Carlo in Graphical Models to inverse a dependency model and he said they were both guides, so is that correct to call them guides. If I am not misunderstanding, I should regard these guides as new get_dependencies() methods?

eb8680 commented 2 years ago

@JIAQING-XIE I'm afraid I don't quite understand your question. Maybe it would help to clarify some terminology.

The term "guide" is Pyro shorthand for a tractable parametric approximation to a model's posterior distribution, usually obtained by optimizing parameters with respect to an estimate of a lower bound on the marginal likelihood. Guides in Pyro are just probabilistic programs with a couple of extra restrictions. "Autoguide" is Pyro shorthand for a procedure to construct a guide automatically from a model; a number of different autoguides are implemented in pyro.infer.autoguide.

Dependency graphs are directed acyclic graphs describing a factorization of a joint probability distribution into a product of conditional probability distributions and can be understood as metadata computable from a probabilistic program. The term "inverse" in the titles "Faithful Inversion..." and "Learning stochastic inverses" refers to "inverting" the edge directions in a dependency graph of a model to obtain a dependency graph corresponding either exactly or approximately to a factorization (which is not unique in general) of the true posterior distribution.

AutoStructured constructs a guide automatically by first computing a possible dependency graph of a guide from the model's dependency graph, populating each node in this graph with a tractable conditional distribution and sampling from them to obtain a sample from a tractable joint distribution over the latent variables in the model. It makes default choices for the dependency graph (a moralized version of the prior dependency graph with a particular choice of edge direction) and the conditional distributions (linear-Gaussian), but also provides an interface for users to customize one or both of these choices.

All three of the papers you refer to describe new families of autoguides which follow a similar recipe to AutoStructured and do indeed each use different guide dependency graphs. Neither default choice in AutoStructured corresponds directly to any of those three papers, but you could probably realize them fairly easily by passing custom dependency graphs and conditional distributions to AutoStructured.

Note also that neither AutoGaussian nor AutoStructured are amortized by default (i.e. their parameters are not explicit functions of observed data from the model), unlike the autoguides proposed in those three papers.