probmods / webppl

Probabilistic programming for the web
http://webppl.org
Other
618 stars 86 forks source link

Variational inference #27

Closed ngoodman closed 8 years ago

ngoodman commented 9 years ago

Basic black box variational inference (see http://arxiv.org/pdf/1301.1299v1.pdf and http://arxiv.org/pdf/1401.0118v1.pdf) is in the codebase.

It needs to be tested and benchmarked.

There are several major performance improvements described in the papers that need to be implemented. Most important rao-blackwellization of the gradient estimates. This may require a flow analysis to determine the markov blanket of random choices.

Once everything is working, try variationally-guided PF: in the particle filter, sample new choices from the variational distribution, instead of the prior. Or possibly mix / interpolate prior and variational distribution. The idea is that variational gets you an importance sampler closer to the posterior modes, while PF helps capture the joint structure ignored by variational.

null-a commented 9 years ago

What are your thoughts on what should be returned by variational inference? I imagine we want to return the actual variational program (rather than summarise it with samples for example) but it's not obvious to me how to do that. It seems like we'd need to reach into the thunk we're passed and set its ERP parameters to those found during inference. This doesn't seem straight forward.

In Stochy, I return a function which runs the original program with a special co-routine which switches in the variational parameters at run-time. This is pretty ugly, I've not convinced myself it's a fully general, and it doesn't support the ERP interface. Do you have any better ideas?

null-a commented 9 years ago

I've started to clean-up/test what we have so far. See my variational branch.

Here's a simple test case I'm working with. We already get close to the optimal parameters as found by the hand-derived variational inference algorithm.

ngoodman commented 9 years ago

Awesome!

The question of what kind of ERP representation to return is a good and tricky one. I think as a simple first-pass, returning an empirical distribution built from samples is ok. The ERP object should have extra fields for best variational parameters and the corresponding (estimated) variational lower-bound on marginal likelihood.

As you say, a better representation of the sample() method for this ERP would be the variational program with it's params fixed. The coroutine trick is a clever and not terrible way to do this (only drawback i see is speed and slight kludginess). The other ways that i can think of all involve reflecting into the source code of the thunk and building a new (minimal) sampler program.

But even with the variational sampler, there's still the question of how to implement the score() method.... This could be tricky if there is a lot of deterministic computation between the random choices and the return value of the marginalized thunk.

Anyhow, it requires more thought! But a lot of the things we'd want to do with variational can already be done with the simple solutions.

null-a commented 9 years ago

Great, thanks!

I'm now returning the estimated lower-bound and an ERP built from samples. See the updated test case.

I've also implemented the control variate idea from "Black Box Variational Inference". I think it's working but I need to test it more thoroughly.

null-a commented 9 years ago

The returned ERP now has a variationalParams field which is an object mapping addresses to vectors of ERP parameters. Is that what you had in mind?

Also, is this to do note still relevant? (The code looks ok to me.)

I've also added inference tests and made a few other tweaks. This is all in the variational branch.

ngoodman commented 8 years ago

We've made a lot of progress in the daipp branch. I think the basic variational infrastructure (ability to specify variational guide distributions, optimization of ELBO via PW+LR estimators, etc) will be ready to merge into dev soon. I am moving this to milestone 0.8 so that we'll have time to test and document before the summer school....

ngoodman commented 8 years ago

btw It would be great if around when this makes it to dev, there is a default for when no guide is given at a sample within Optimize. E.g. do mean-field by taking the guide at each sample to be the same Dist with the dist params upgraded to guide params. (I guess it would be good to print a "warning: defaulting to mean-field" to the console in this case.)

(Maybe this was already part of the plan...)

null-a commented 8 years ago

do mean-field by taking the guide at each sample to be the same Dist with the dist params upgraded to guide params

Yes, I was planning to do this. My intention is to factor the information about parameters and their constraints out of daipp, so that mean-field can also do appropriate parameter squishing.

null-a commented 8 years ago

Here are the remaining changes I intend to make before opening a PR for this:

Ideally, for simplicity, I'd like to just merge the daipp branch once this is done. The only reason we might prefer not to do this is that it will include a few un-finished and un-tested bits. These are:

I am moving this to milestone 0.8 so that we'll have time to test and document before the summer school....

I take this to mean we'll add docs for this later.

ngoodman commented 8 years ago

sounds good! i think merging daipp into dev is ok -- the extra bits will just remain undocumented until they are done and tested. (perhaps put notes to this effect at the top of the relevant source files...)

yes, we can document later (though if you have time to add a stubb to the inference section of docs, that'll get us started).