Factor out repeated code in inference algorithms

stuhlmueller commented 9 years ago

There is a lot of overlap in pf, pmcmc, smc; and in mh and smc. One way to combat this would be to just have a single smc algorithm, and implement all of the algorithms mentioned above as parameterizations of this uber algorithm. This has some advantages, but doesn't feel entirely right, based on what smc currently looks like; it would be better if we had smaller compositional pieces.

stuhlmueller commented 9 years ago

Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.

null-a commented 9 years ago

I've made an initial attempt to refactor mh and smc. The basic idea is to structure things around functions which operate on traces. For example, there's a new MHKernel function which takes a webppl program and a trace from a previous execution and returns a new trace using MH. With this in hand you can do MCMC with a fairly simple iteration which accumulates results in a histogram, and we can do rejuvenation in smc without duplicating any of the MH propose/accept/reject logic.

One thing I'm keeping in mind is Noah's suggested interface for SMC. This gets us part of the way there, although it's not clear to me how much work it would take to make it possible to plug in other kernels.

This is a very rough sketch (lots of rough edges, I'm not certain how well it extends to the other algorithms, I don't know if it incurs a performance hit, etc.), but do you think it's heading in a useful direction?

null-a commented 9 years ago

My initial attempt at refactoring some of the inference algorithms (more detail in previous comment) factors out the (rejection) strategy used to initialize the Markov chain. At present this increases the amount of code (because the initialization strategy is a new coroutine) but it seems possible that going this way yields further opportunities to simplify things that will leave us with less code once we're finished.

The idea of "make traces more central" that was mentioned in the planning meeting sounds superficially similar to this approach. I'll investigate further and report back once I've understood incrementalization.

null-a commented 9 years ago

I've spent a little more time on this - here's what MH and SMC currently look like.

This approach has already removed quite a lot of duplication, and if anything SMC and MH are now more similar than they were before so I'm confident they can be simplified further. I'll make a little more progress and perhaps have an initial stab at (the closely related) #86 before eliciting initial feedback from anyone interested.

iffsid commented 9 years ago

I had to do something along these lines for HMC and AD here, using a factored-out trace. The code is not explicitly factored out as it is here, however.

null-a commented 9 years ago

This is now at a point where it would be good to get some feedback. In particular I'd like to know whether you think this is the direction we should be heading.

Here's the latest diff.

To save you reading back through the thread I'll give a brief overview again here:

The core idea is to organize things around procedures which operate on traces. I've applied this to simplified versions of MH and SMC so far. Doing this helps in two main ways:

We reduce duplication by factoring out a Trace library. This is a data structure which is used for both MH traces and SMC particles.
Procedures which operate on traces can be composed in useful ways.

I think the resulting code is quite a bit easier to understand than what we have now. (Though I'm not the best person to judge having just written the new stuff.) I think the biggest improvement is to the PF rejuvenation code, so that might be something to look at and compare.

I've also begun to implement the Infer interface @ngoodman described in #86. You can already do this kind of thing:

// num iterations etc. dropped for clarity
Infer(model, { method: MCMC, init: Rejection, kernel: MHKernel });
// Initialize a Markov chain using a particle filter, then do MH.
Infer(model, { method: MCMC, init: PFInit, kernel: MHKernel });
Infer(model, { method: PF, rejuvKernel: MHKernel });

This was straight-forward to implement once things were organized around traces.

What about all the other inference algorithms?

They continue to work as they are so we could just concentrate on MH and SMC to begin with.
I think HashMH could be implemented by simply providing an alternate implementation of Trace which uses different data structures. MHKernel would remain largely unchanged I think.
HMC already looks superficially similar to this approach as @iffsid has also factored out a Trace object. To make it possible to plug HMC into Infer as a kernel for MCMC or rejuvenation it would need to be refactored as a procedure which operates on traces. (And ideally use the same Trace library as everything else.) This seems do-able.
It's possible that a lot of this code could be re-used in a re-worked PMCMC implementation, though I've not thought carefully enough about it to be very sure.

One outstanding question is whether organising things this way has a significant impact on performance. I don't expect it will but I need to check.

Hope that makes sense, happy to answer any questions if not.

stuhlmueller commented 9 years ago

This direction looks good to me. I think factoring out the trace data structure and making the core inference algorithms operate on traces is the way to go.

Some comments, at various levels of abstraction:

Trace.prototype.complete is not very descriptive - maybe be more explicit about what it does (setValue, if that's all it does? If this can only happen once, we could check that the value hasn't been set already, and throw an error otherwise.)
We should consider replacing the current trace implementation with something like what's used in HashMH. findChoice results in O(n^2) complexity at the moment for newly created trace structure (with n random choices), but should be O(n).
Why does Trace take continuation and store as arguments when it doesn't need them?
I wonder if particles should contain traces, but not be identical to them? Otherwise, we'll be writing trace attributes like weight that aren't foreseen in trace.js, and breaking the interface.
As you've already pointed out in the code, naming could be improved. Naming all trace => trace operators ...Kernel, and all => trace operators ...Init would be a start. Not sure about what to do with ParticleFilterCore, but it would be great to have a better name for that.
PMCMC isn't used much, so it's probably not worth spending much time on refactoring it. Integrating HMC in this framework would be great (but also depends on how we deal with AD, in particular in the browser?). It would also be good to integrate incremental MCMC, since this is now our fastest MCMC implementation for many models.
In terms of file organization: would it make sense to move Infer to header.js, and MCMC and PF each to a corresponding file (maybe the existing particlefilter2.js for PF), removing infer.js? That would feel a little more modular.
Are you planning to make the other inference algorithms also work via the Infer interface before you merge this? It would be nice to have a consistent interface to everything right away.
Should we think about how to make Enumerate use the trace data structure, too? Or would that only introduce unnecessary overhead?

iffsid commented 9 years ago

The trace.js library in the ad branch already implements the HashMH style lookups. It introduces an extra trace data-structure called addressIndices, which is a hashmap from addresses to trace indices. This makes lookup O(1)

For particles, you can simply extend the Trace class to a Particle class with additional information. That's what I'm currently using to get SMC to use HMC for rejuvenation.

null-a commented 9 years ago

These are missing from my simplified/refactored MH/SMC implementation:

[x] Hash based trace look-ups
[x] MH proposers
[x] SMC importance distributions
[x] SMC logZ estimate
[x] Residual re-sampling (SMC)
[x] Bail out of MH early if resample same value at regen
[x] Bail out of MH early if score is -Infinity
[x] Final rejuvenation at SMC exit
[x] #120
[x] Support query table in MH [Need to check this is correct]
[x] #156
[x] MAP/samples only option
[x] lag/burn option
[x] Variable number of factors in SMC (Working on new version. Also see #20.)
[x] Understand why PFRj performs deep copies of traces.
[x] Support for setProposalBoundary
[x] Remove assertions from Trace/acceptProb
[x] Remove code for previous versions

I also need to:

[x] Check MH & SMC work for models with zero ERP (#155) and small models (#51).

Enhancements:

[x] Warn/error if it looks like ~~rejection~~ initialization isn't going to find a trace with prob > 0. (Rather than looping forever.) Now #204.

WIP plan of what happens to each inference algorithm once we're done:

Before	After
MH	MH kernel + MCMC
HashMH	Retire if best bits merged into MH kernel?
ParticleFilterRejuv	SMC + MH kernel
ParticleFilter	Retire if important dists/variable factors support is added to SMC?
IncrementalMH	IncrementalMH kernel. Maybe rename MH if this is the main MH method? Will need to maintain cache across multiple applications.
HMC	HMC kernel.
Rejection	Implemented using the rejection routine used for MH initialization
Variational	No change to current version. Need to make sure that structuring things around traces will allow us to try PF guided by the trained variational program once vipp is merged.
Enumerate, AsyncPF, PMCMC	No change to algorithms. Make available through new `Infer` interface?

We might consider tackling HMC and IncrementalMH later as separate pieces of work. If we do that I think it would be a good idea to at least convince ourselves that this approach (organizing things around traces) is going to work.

null-a commented 9 years ago

Some comments, at various levels of abstraction:

Thanks for this, I think I agree with everything you said!

I've had a couple of new thoughts about naming.

Perhaps the function I currently call PF should be renamed SMC. Then the function I call ParticleFilterCore and other functions of the same type would be called things like SIS and SIR. I like this because the two high-level functions MCMC and SMC do seem like they correspond to the general methods of the same name. (Though there's lots of terminology around and I might have misunderstood.) One slight wrinkle here is that it seems like you should be able to plug a rejuvenation kernel into SMC and have either SIS or SIR use it. However, as I currently have things you'd plug the kernel into SIR. This isn't a big deal, especially if we only have SIR.
The Rejection and PFInit functions which have type webpplFn => Trace are perhaps best described as samplers, and it seems reasonable to say we initialize MH with a RejectionSampler. We might also have another function at the same level of abstraction as MCMC and SMC which takes one of these samplers, invokes it many times and builds a histogram. i.e. basic Monte Carlo sampling. (Not all functions of type webpplFn => Trace are samplers of course.)

stuhlmueller commented 9 years ago

Thanks for making the table - very helpful!

HashMH - Retire if best bits merged into MH kernel?

Agreed.

ParticleFilter - Retire if important dists/variable factors support is added to SMC?

If we take the planned SMC implementation and ignore the rejuvenation bits, will it be essentially identical in complexity to what a PF implementation would be, or are there a bunch of extra bits? If there is extra complexity (outside of the rejuvenation loop), it would be better to maintain a separate ParticleFilter implementation. If not (which I expect to be the case), then let's retire ParticleFilter.

IncrementalMH - IncrementalMH kernel. Maybe rename MH if this is the main MH method? Will need to maintain cache across multiple applications.

At some point, we should rename it, but it could use a bit more testing (in real usage) before then.

We might consider tackling HMC and IncrementalMH later as separate pieces of work. If we do that I think it would be a good idea to at least convince ourselves that this approach (organizing things around traces) is going to work.

Sounds fine to me, as long as we are confident that we'll be able to integrate those later.

No change to algorithms. Make available through new Infer interface?

Yes, would be good to have everything available through the new interface.

Warn/error if it looks like rejection isn't going to find a trace with prob > 0. (Rather than looping forever.)

That would be great to have (though could also be a separate piece of work, if you wanted).

stuhlmueller commented 9 years ago

Perhaps the function I currently call PF should be renamed SMC. Then the function I call ParticleFilterCore and other functions of the same type would be called things like SIS and SIR.

I like having SMC and MCMC as names for the two high-level inference functions.

One slight wrinkle here is that it seems like you should be able to plug a rejuvenation kernel into SMC and have either SIS or SIR use it.

My understanding of the naming conventions (which are indeed confusing) is that neither SIS nor SIR usually have rejuvenation steps, and that SMC is used to refer to SIR with rejuvenation steps. I think of SIS, SIR and SMC to have the same type, so I'm not sure we should rename [what is plugged in in place of] ParticleFilterCore to SIS or SIR. (I don't have more constructive suggestions for what to name it at the moment, sorry!)

The Rejection and PFInit functions which have type webpplFn => Trace are perhaps best described as samplers, and it seems reasonable to say we initialize MH with a RejectionSampler.

The sampler we use to initialize MH is a weird sampler, though, no? Unlike a traditional rejection sampler, we'll accept as soon as we find any trace with non-zero probability (and we wouldn't want it to be otherwise, or init would take forever for realistic problems). I think it's fine to name it ...Sampler, but we should make sure it doesn't get confused with a normal rejection sampler.

We might also have another function at the same level of abstraction as MCMC and SMC which takes one of these samplers, invokes it many times and builds a histogram. i.e. basic Monte Carlo sampling. (Not all functions of type webpplFn => Trace are samplers of course.)

Interesting idea. It would indeed be useful to have a MC method which allows the user to re-run an algorithm like rejection, MCMC, and SMC many times and get an ERP that aggregates the results.

null-a commented 9 years ago

We might consider tackling HMC and IncrementalMH later as separate pieces of work

Sounds fine to me, as long as we are confident that we'll be able to integrate those later.

I've been trying to familiarize myself with IncrementalMH to make sure we can. This has raised a couple of questions, the answers to which might help me see how I need to structure things:

Am I right to assume that we'd like to be able to use a future IncrementalMHKernel in all the places the MHKernel will be useable? e.g. As a rejuvenation kernel for SMC, with various initialization strategies in MCMC etc.
If so, will the other algorithms (e.g. SMC) need to concern themselves with maintaining the cache?
Similarly, how much (if any) of the state stored in the ERP master list (which appears to play a similar role to the trace structure used elsewhere, right?) would you expect other inference algorithms to have to maintain, beyond that which is already kept in the trace. (This perhaps overlaps with 2 as I think the nodes in the ERP master list are also nodes in the cache.)

(cc @dritchie)

dritchie commented 9 years ago

First: yes, the ERP master list is just a listing of all the ERP nodes that are in the cache. This is maintained separately so that choosing an ERP to propose to at random can be done quickly, without having to traverse the cache.

As for interop with other algorithms, this has been on my mind a bit lately. It would be ideal to have an IncrementalMHKernel that could be plugged in anywhere that others kernels can be. But you're right that this is complicated by the fact that the way Incremental MH represents a 'trace' is very different than how other algorithms do it (I haven't looked at the code, but HMC might have a similar--though less exaggerated--issue).

I've been thinking that the best way to do this might be to define a minimal interface that traces need to expose. Operations like "get/set the value of an ERP given an address," "re-execute from this address," or "reject proposal and reset trace." If algorithms that use MCMC kernels only interact with traces via these methods, then things should work out. I'm hopeful that such a common interface does exist, but obviously I haven't done the legwork to flesh it out.

If you do start designing such an interface, and you're wondering if Incremental MH can support such-and-such an operation, feel free to ask.

iffsid commented 9 years ago

@dritchie It appears that the HMC code does include something along the lines you specify; particularly, the code in trace.js and the functions mhPropose, and HMC.prototype.exit in hmc.js

The trace library is written such that one appends a traceEntry to the trace at the appropriate places (set), and the trace datastructure internally maintains the means (addressIndices) for a constant time lookup (get) based on address. The rexecute-from and reject-restore bits are not built into the trace, but the mhPropose and HMC.prototype.exit functions implement them separately.

null-a commented 9 years ago

@dritchie That's helpful, thanks!

I want to make sure everyone understands what I'm up to, so here's a quick summary to save you reading back through the issue history.

I've been thinking that the best way to do this might be to define a minimal interface that traces need to expose.

A large part of this (i.e. issue #64) is exactly that. It already includes interop between SMC and MH via an interface on traces similar to the one you described and similar to that which @iffsid has in HMC.

The rough plan is:

Switch SMC and MH use this interface.
Have HMC use the same interface once it is merged.
Switch IncrementMH to this interface.

I'm thinking about IncrementMH now because I don't want to head down this path and find out later that it's a dead end.

If algorithms that use MCMC kernels only interact with traces via these methods

I'm currently imagining that this is slightly more general than that. One consequence of my approach is that operations like:

"re-execute from this address," or "reject proposal and reset trace."

... would belong to particular algorithms and not the trace itself. (@iffsid described something similar for HMC.)

My current feeling is that this will indeed work out and I should press ahead, but if there's something I'm overlooking please let me know! I'm happy to explain things in more detail to anyone who is interested.

null-a commented 9 years ago

I did some crude benchmarking of the refactored algorithms. In summary:

The refactored MH implementation matches the performance of HashMH.
We should be able to retire ParticleFilter as setting rejuvSteps = 0 in the new SMC implementation runs about as fast. (As of this commit.)
PF with rejuvenation seems to be faster (by a small constant) in the refactored version. I can't yet fully explain why this is the case.

Here are the details. This is runtime as measured with --verbose, shown in ms/10. The refactored version is always the right-most column.

The refactored code contains lots of additional assertions which I removed before doing this.

examples/lda.wppl

iterations / 1k	MH	HashMH	MCMC + MHKernel
10	560	214	201
20	1241	420	369
30	1972	679	591

examples/hmmIncremental.wppl with 10 time steps

particles / 1k	ParticleFilter	SMC with `rejuvSteps = 0`
5	206	212
10	763	744
15	1702	1683

examples/hmmIncremental.wppl with 5k particles

time steps	ParticleFilter	SMC with `rejuvSteps = 0`
10	206	212
15	315	319
20	409	427

examples/hmmIncremental.wppl with 5 time steps, 1k particles

rejuv steps	ParticleFilterRejuv	ParticleFilterRejuv*	SMC
10	351	128	86
15	508	186	110
20	717	253	131

(*) ParticleFilterRejuv with calls to deepCopyTrace removed, after which inference tests still pass.

stuhlmueller commented 9 years ago

Looks great!

We should be able to retire ParticleFilter as setting rejuvSteps = 0 in the new SMC implementation runs about as fast. (As of this commit.)

One of the reasons for keeping it around was that it was easier to understand than the version with rejuvenation, so it was potentially useful as an educational tool for getting people started on the codebase (the next step up from the particle filter described in dippl) and as a basis for inference experiments. The refactored code is hopefully readable enough that we don't need to keep around an extra particle filter implementation, but it's worth keeping that motivation in mind.

(*) ParticleFilterRejuv with calls to deepCopyTrace removed, after which inference tests still pass.

Is the refactored version (doing the equivalent of) not deep-copying the trace? If so, we should make sure that that's correct in general. There may have been a reason for using deep copies.

null-a commented 9 years ago

One of the reasons for keeping it around was that it was easier to understand than the version with rejuvenation

Gotcha, thanks. You'll have to see what you think of the refactored version. My feeling is that ParticleFilter got harder to understand once variable numbers of factors were handled and to a lesser extent when importance distributions were added. For teaching maybe you want a version stripped of one or both of those? Could such a thing live in a package?

Is the refactored version not deep-copying the trace?

Yes. I think that's correct, but I'll double check and then maybe give you an argument why.

null-a commented 9 years ago

Is PFRjAsMH something that's used in practice for inference or is it just for test purposes? The refactored code is perhaps a little stricter about catching unexpected score == -Infinity and the like, some of which would will need loosening off to make this work.

stuhlmueller commented 9 years ago

For teaching maybe you want a version stripped of one or both of those? Could such a thing live in a package?

Yeah, good idea.

Is PFRjAsMH something that's used in practice for inference or is it just for test purposes?

Just for testing, for now - but my intuition is that our SMC implementation should work in the extreme case of only one particle.

null-a commented 9 years ago

my intuition is that our SMC implementation should work in the extreme case of only one particle.

In the case where advancing the single particle to the next observation leads to a score of -Inf, I wonder if it would be preferable to keep re-running that step of SMC until we find an extension to the particle/trace with non-zero probability before doing rejuv. Otherwise, the first change rejuv proposes will be accepted, and it seems quite likely it will wipe out all the progress made during this careful initialization.

null-a commented 9 years ago

Progress update: There are only a few things left to do on my todo list. I'm hoping to open a pull request for this issue at the beginning of next week.

stuhlmueller commented 9 years ago

I wonder if it would be preferable to keep re-running that step of SMC until we find an extension to the particle/trace with non-zero probability before doing rejuv.

I don't think that's possible - we don't know that there is any local extension with non-zero probability. (Consider the case where no random choices have happened since the last factor.)

Otherwise, the first change rejuv proposes will be accepted, and it seems quite likely it will wipe out all the progress made during this careful initialization.

A big reason for this is that if both old and new score are -Infinity, we accept the proposal, right? If we didn't do this, then the first proposal we accept would be the first one that changes the score to a number greater than -Infinity. For models without structural changes, I'd guess that this would commonly be a proposal that changes the most recent random choice (although it would depend a lot on the dependence structure of the model). So, assuming that we still have this behavior in the new implementation, I'd expect that turning it off would help somewhat. I wouldn't optimize for this use case more than that for now.

Progress update: There are only a few things left to do on my todo list. I'm hoping to open a pull request for this issue at the beginning of next week.

Awesome!

null-a commented 9 years ago

Is the refactored version (doing the equivalent of) not deep-copying the trace? If so, we should make sure that that's correct in general.

I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices made at sample statements (e.g. by reaching back into the trace history) nor do we mutate anything that these choice object reference. (With the exception of the store of course, but we take care to clone that where necessary.)

For reference the trace in the current version of ParticleFilterRejuv looks something like this:

[ { k: <...>, name: <...>, erp: <...>, params: <...>,
    score: <...>, forwardChoiceScore: <...>,
    val: <...>, reused: <...>, store: <...> }, ... ]

A deep copy happens during resampling and before MH generates a new proposal.

There may have been a reason for using deep copies.

I had a look at the commit history to see if I could find any clues but didn't have much luck. It looks like the initial commit will have been broken as traces will have been shared between particles. The deep copy was added here - ring any bells?

null-a commented 9 years ago

PF with rejuvenation seems to be faster (by a small constant) in the refactored version. I can't yet fully explain why this is the case.

It turns out that this is because the current rejuvenation code doesn't have the optimizations which bail out of the proposal early e.g. when the probability becomes zero.

null-a commented 9 years ago

Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.

I've not done this yet and would prefer to tackle it as a separate task.

The refactored MH doesn't do anything with mh-diagnostics/diagnostics.js but it does have the justSample option allowing all samples to be collected. Is that sufficient for now?

iffsid commented 9 years ago

I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices...

This is not true for HMC. Computing the derivative of the score will mutate sensitivities on the tape constructed through the trace via re-defined operators.

stuhlmueller commented 9 years ago

The deep copy was added here - ring any bells?

I don't remember why exactly this was added. I should have added a note in the commit message or code - sorry! I suspect I was concerned about the store not getting copied wherever it needed to be copied, possibly because I ran into a case where this actually happened (though I also wouldn't be surprised if I just added it preemptively).

Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.

I've not done this yet and would prefer to tackle it as a separate task.

Fine with me.

The refactored MH doesn't do anything with mh-diagnostics/diagnostics.js but it does have the justSample option allowing all samples to be collected. Is that sufficient for now?

Also fine - we can revisit this when we add hooks.

I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices...

This is not true for HMC.

If it's only necessary for HMC, we should only deep-copy when we're actually running HMC, so that we don't incur unnecessary cost otherwise.

iffsid commented 9 years ago

If it's only necessary for HMC, we should only deep-copy when we're actually running HMC, so that we don't incur unnecessary cost otherwise.

This should be fairly straightforward. You'd only need to check if the score for the trace is a Number or a Tape and then clone accordingly.

However, one thing to note is that this does necessitate that the trace is entirely self-contained; i.e, the trace is built without other properties of coroutine/kernel also computing/referring to parts of the tape. E.g: things like this.currScore should be avoided since computing gradients mutates them, but cloning trace doesn't consider this.

I believe that this is handled correctly in the refactor, but I haven't checked that thoroughly, so I'm not 100% certain.

null-a commented 9 years ago

Closed by #169.

probmods / webppl

Factor out repeated code in inference algorithms #64