Closed stuhlmueller closed 9 years ago
Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.
I've made an initial attempt to refactor mh and smc. The basic idea is to structure things around functions which operate on traces. For example, there's a new MHKernel
function which takes a webppl program and a trace from a previous execution and returns a new trace using MH. With this in hand you can do MCMC with a fairly simple iteration which accumulates results in a histogram, and we can do rejuvenation in smc without duplicating any of the MH propose/accept/reject logic.
One thing I'm keeping in mind is Noah's suggested interface for SMC. This gets us part of the way there, although it's not clear to me how much work it would take to make it possible to plug in other kernels.
This is a very rough sketch (lots of rough edges, I'm not certain how well it extends to the other algorithms, I don't know if it incurs a performance hit, etc.), but do you think it's heading in a useful direction?
My initial attempt at refactoring some of the inference algorithms (more detail in previous comment) factors out the (rejection) strategy used to initialize the Markov chain. At present this increases the amount of code (because the initialization strategy is a new coroutine) but it seems possible that going this way yields further opportunities to simplify things that will leave us with less code once we're finished.
The idea of "make traces more central" that was mentioned in the planning meeting sounds superficially similar to this approach. I'll investigate further and report back once I've understood incrementalization.
I've spent a little more time on this - here's what MH and SMC currently look like.
This approach has already removed quite a lot of duplication, and if anything SMC and MH are now more similar than they were before so I'm confident they can be simplified further. I'll make a little more progress and perhaps have an initial stab at (the closely related) #86 before eliciting initial feedback from anyone interested.
This is now at a point where it would be good to get some feedback. In particular I'd like to know whether you think this is the direction we should be heading.
Here's the latest diff.
To save you reading back through the thread I'll give a brief overview again here:
The core idea is to organize things around procedures which operate on traces. I've applied this to simplified versions of MH and SMC so far. Doing this helps in two main ways:
Trace
library. This is a data structure which is used for both MH traces and SMC particles.I think the resulting code is quite a bit easier to understand than what we have now. (Though I'm not the best person to judge having just written the new stuff.) I think the biggest improvement is to the PF rejuvenation code, so that might be something to look at and compare.
I've also begun to implement the Infer
interface @ngoodman described in #86. You can already do this kind of thing:
// num iterations etc. dropped for clarity
Infer(model, { method: MCMC, init: Rejection, kernel: MHKernel });
// Initialize a Markov chain using a particle filter, then do MH.
Infer(model, { method: MCMC, init: PFInit, kernel: MHKernel });
Infer(model, { method: PF, rejuvKernel: MHKernel });
This was straight-forward to implement once things were organized around traces.
What about all the other inference algorithms?
HashMH
could be implemented by simply providing an alternate implementation of Trace
which uses different data structures. MHKernel
would remain largely unchanged I think.HMC
already looks superficially similar to this approach as @iffsid has also factored out a Trace
object. To make it possible to plug HMC
into Infer
as a kernel for MCMC or rejuvenation it would need to be refactored as a procedure which operates on traces. (And ideally use the same Trace
library as everything else.) This seems do-able.PMCMC
implementation, though I've not thought carefully enough about it to be very sure.One outstanding question is whether organising things this way has a significant impact on performance. I don't expect it will but I need to check.
Hope that makes sense, happy to answer any questions if not.
This direction looks good to me. I think factoring out the trace data structure and making the core inference algorithms operate on traces is the way to go.
Some comments, at various levels of abstraction:
Trace.prototype.complete
is not very descriptive - maybe be more explicit about what it does (setValue
, if that's all it does? If this can only happen once, we could check that the value hasn't been set already, and throw an error otherwise.)HashMH
. findChoice
results in O(n^2) complexity at the moment for newly created trace structure (with n random choices), but should be O(n).Trace
take continuation
and store
as arguments when it doesn't need them?weight
that aren't foreseen in trace.js
, and breaking the interface.trace => trace
operators ...Kernel
, and all => trace
operators ...Init
would be a start. Not sure about what to do with ParticleFilterCore
, but it would be great to have a better name for that.PMCMC
isn't used much, so it's probably not worth spending much time on refactoring it. Integrating HMC
in this framework would be great (but also depends on how we deal with AD, in particular in the browser?). It would also be good to integrate incremental MCMC, since this is now our fastest MCMC implementation for many models.Infer
to header.js
, and MCMC
and PF
each to a corresponding file (maybe the existing particlefilter2.js
for PF
), removing infer.js
? That would feel a little more modular.Infer
interface before you merge this? It would be nice to have a consistent interface to everything right away.Enumerate
use the trace data structure, too? Or would that only introduce unnecessary overhead?The trace.js
library in the ad
branch already implements the HashMH
style lookups. It introduces an extra trace data-structure called addressIndices
, which is a hashmap from addresses to trace indices. This makes lookup O(1)
For particles, you can simply extend the Trace
class to a Particle
class with additional information. That's what I'm currently using to get SMC
to use HMC
for rejuvenation.
These are missing from my simplified/refactored MH/SMC implementation:
PFRj
performs deep copies of traces.setProposalBoundary
I also need to:
Enhancements:
WIP plan of what happens to each inference algorithm once we're done:
Before | After |
---|---|
MH | MH kernel + MCMC |
HashMH | Retire if best bits merged into MH kernel? |
ParticleFilterRejuv | SMC + MH kernel |
ParticleFilter | Retire if important dists/variable factors support is added to SMC? |
IncrementalMH | IncrementalMH kernel. Maybe rename MH if this is the main MH method? Will need to maintain cache across multiple applications. |
HMC | HMC kernel. |
Rejection | Implemented using the rejection routine used for MH initialization |
Variational | No change to current version. Need to make sure that structuring things around traces will allow us to try PF guided by the trained variational program once vipp is merged. |
Enumerate, AsyncPF, PMCMC | No change to algorithms. Make available through new Infer interface? |
We might consider tackling HMC and IncrementalMH later as separate pieces of work. If we do that I think it would be a good idea to at least convince ourselves that this approach (organizing things around traces) is going to work.
Some comments, at various levels of abstraction:
Thanks for this, I think I agree with everything you said!
I've had a couple of new thoughts about naming.
PF
should be renamed SMC
. Then the function I call ParticleFilterCore
and other functions of the same type would be called things like SIS
and SIR
. I like this because the two high-level functions MCMC
and SMC
do seem like they correspond to the general methods of the same name. (Though there's lots of terminology around and I might have misunderstood.) One slight wrinkle here is that it seems like you should be able to plug a rejuvenation kernel into SMC
and have either SIS
or SIR
use it. However, as I currently have things you'd plug the kernel into SIR
. This isn't a big deal, especially if we only have SIR
.Rejection
and PFInit
functions which have type webpplFn => Trace
are perhaps best described as samplers, and it seems reasonable to say we initialize MH
with a RejectionSampler
. We might also have another function at the same level of abstraction as MCMC
and SMC
which takes one of these samplers, invokes it many times and builds a histogram. i.e. basic Monte Carlo sampling. (Not all functions of type webpplFn => Trace
are samplers of course.)Thanks for making the table - very helpful!
HashMH - Retire if best bits merged into MH kernel?
Agreed.
ParticleFilter - Retire if important dists/variable factors support is added to SMC?
If we take the planned SMC implementation and ignore the rejuvenation bits, will it be essentially identical in complexity to what a PF implementation would be, or are there a bunch of extra bits? If there is extra complexity (outside of the rejuvenation loop), it would be better to maintain a separate ParticleFilter implementation. If not (which I expect to be the case), then let's retire ParticleFilter.
IncrementalMH - IncrementalMH kernel. Maybe rename MH if this is the main MH method? Will need to maintain cache across multiple applications.
At some point, we should rename it, but it could use a bit more testing (in real usage) before then.
We might consider tackling HMC and IncrementalMH later as separate pieces of work. If we do that I think it would be a good idea to at least convince ourselves that this approach (organizing things around traces) is going to work.
Sounds fine to me, as long as we are confident that we'll be able to integrate those later.
No change to algorithms. Make available through new Infer interface?
Yes, would be good to have everything available through the new interface.
Warn/error if it looks like rejection isn't going to find a trace with prob > 0. (Rather than looping forever.)
That would be great to have (though could also be a separate piece of work, if you wanted).
Perhaps the function I currently call
PF
should be renamedSMC
. Then the function I callParticleFilterCore
and other functions of the same type would be called things likeSIS
andSIR
.
I like having SMC
and MCMC
as names for the two high-level inference functions.
One slight wrinkle here is that it seems like you should be able to plug a rejuvenation kernel into
SMC
and have eitherSIS
orSIR
use it.
My understanding of the naming conventions (which are indeed confusing) is that neither SIS nor SIR usually have rejuvenation steps, and that SMC is used to refer to SIR with rejuvenation steps. I think of SIS, SIR and SMC to have the same type, so I'm not sure we should rename [what is plugged in in place of] ParticleFilterCore
to SIS or SIR. (I don't have more constructive suggestions for what to name it at the moment, sorry!)
The
Rejection
andPFInit
functions which have typewebpplFn => Trace
are perhaps best described as samplers, and it seems reasonable to say we initializeMH
with aRejectionSampler
.
The sampler we use to initialize MH is a weird sampler, though, no? Unlike a traditional rejection sampler, we'll accept as soon as we find any trace with non-zero probability (and we wouldn't want it to be otherwise, or init would take forever for realistic problems). I think it's fine to name it ...Sampler
, but we should make sure it doesn't get confused with a normal rejection sampler.
We might also have another function at the same level of abstraction as
MCMC
andSMC
which takes one of these samplers, invokes it many times and builds a histogram. i.e. basic Monte Carlo sampling. (Not all functions of typewebpplFn => Trace
are samplers of course.)
Interesting idea. It would indeed be useful to have a MC
method which allows the user to re-run an algorithm like rejection, MCMC, and SMC many times and get an ERP that aggregates the results.
We might consider tackling HMC and IncrementalMH later as separate pieces of work
Sounds fine to me, as long as we are confident that we'll be able to integrate those later.
I've been trying to familiarize myself with IncrementalMH to make sure we can. This has raised a couple of questions, the answers to which might help me see how I need to structure things:
IncrementalMHKernel
in all the places the MHKernel
will be useable? e.g. As a rejuvenation kernel for SMC, with various initialization strategies in MCMC etc.(cc @dritchie)
First: yes, the ERP master list is just a listing of all the ERP nodes that are in the cache. This is maintained separately so that choosing an ERP to propose to at random can be done quickly, without having to traverse the cache.
As for interop with other algorithms, this has been on my mind a bit lately. It would be ideal to have an IncrementalMHKernel
that could be plugged in anywhere that others kernels can be. But you're right that this is complicated by the fact that the way Incremental MH represents a 'trace' is very different than how other algorithms do it (I haven't looked at the code, but HMC might have a similar--though less exaggerated--issue).
I've been thinking that the best way to do this might be to define a minimal interface that traces need to expose. Operations like "get/set the value of an ERP given an address," "re-execute from this address," or "reject proposal and reset trace." If algorithms that use MCMC kernels only interact with traces via these methods, then things should work out. I'm hopeful that such a common interface does exist, but obviously I haven't done the legwork to flesh it out.
If you do start designing such an interface, and you're wondering if Incremental MH can support such-and-such an operation, feel free to ask.
@dritchie It appears that the HMC code does include something along the lines you specify; particularly, the code in trace.js
and the functions mhPropose
, and HMC.prototype.exit
in hmc.js
The trace
library is written such that one appends a traceEntry
to the trace
at the appropriate places (set), and the trace
datastructure internally maintains the means (addressIndices
) for a constant time lookup (get) based on address. The rexecute-from and reject-restore bits are not built into the trace, but the mhPropose
and HMC.prototype.exit
functions implement them separately.
@dritchie That's helpful, thanks!
I want to make sure everyone understands what I'm up to, so here's a quick summary to save you reading back through the issue history.
I've been thinking that the best way to do this might be to define a minimal interface that traces need to expose.
A large part of this (i.e. issue #64) is exactly that. It already includes interop between SMC and MH via an interface on traces similar to the one you described and similar to that which @iffsid has in HMC.
The rough plan is:
I'm thinking about IncrementMH now because I don't want to head down this path and find out later that it's a dead end.
If algorithms that use MCMC kernels only interact with traces via these methods
I'm currently imagining that this is slightly more general than that. One consequence of my approach is that operations like:
"re-execute from this address," or "reject proposal and reset trace."
... would belong to particular algorithms and not the trace itself. (@iffsid described something similar for HMC.)
My current feeling is that this will indeed work out and I should press ahead, but if there's something I'm overlooking please let me know! I'm happy to explain things in more detail to anyone who is interested.
I did some crude benchmarking of the refactored algorithms. In summary:
HashMH
.ParticleFilter
as setting rejuvSteps = 0
in the new SMC
implementation runs about as fast. (As of this commit.)Here are the details. This is runtime as measured with --verbose
, shown in ms/10
. The refactored version is always the right-most column.
The refactored code contains lots of additional assertions which I removed before doing this.
examples/lda.wppl
iterations / 1k | MH | HashMH | MCMC + MHKernel |
---|---|---|---|
10 | 560 | 214 | 201 |
20 | 1241 | 420 | 369 |
30 | 1972 | 679 | 591 |
examples/hmmIncremental.wppl
with 10 time steps
particles / 1k | ParticleFilter | SMC with rejuvSteps = 0 |
---|---|---|
5 | 206 | 212 |
10 | 763 | 744 |
15 | 1702 | 1683 |
examples/hmmIncremental.wppl
with 5k particles
time steps | ParticleFilter | SMC with rejuvSteps = 0 |
---|---|---|
10 | 206 | 212 |
15 | 315 | 319 |
20 | 409 | 427 |
examples/hmmIncremental.wppl
with 5 time steps, 1k particles
rejuv steps | ParticleFilterRejuv | ParticleFilterRejuv* | SMC |
---|---|---|---|
10 | 351 | 128 | 86 |
15 | 508 | 186 | 110 |
20 | 717 | 253 | 131 |
(*) ParticleFilterRejuv
with calls to deepCopyTrace
removed, after which inference tests still pass.
Looks great!
We should be able to retire
ParticleFilter
as settingrejuvSteps = 0
in the newSMC
implementation runs about as fast. (As of this commit.)
One of the reasons for keeping it around was that it was easier to understand than the version with rejuvenation, so it was potentially useful as an educational tool for getting people started on the codebase (the next step up from the particle filter described in dippl) and as a basis for inference experiments. The refactored code is hopefully readable enough that we don't need to keep around an extra particle filter implementation, but it's worth keeping that motivation in mind.
(*)
ParticleFilterRejuv
with calls todeepCopyTrace
removed, after which inference tests still pass.
Is the refactored version (doing the equivalent of) not deep-copying the trace? If so, we should make sure that that's correct in general. There may have been a reason for using deep copies.
One of the reasons for keeping it around was that it was easier to understand than the version with rejuvenation
Gotcha, thanks. You'll have to see what you think of the refactored version. My feeling is that ParticleFilter
got harder to understand once variable numbers of factors were handled and to a lesser extent when importance distributions were added. For teaching maybe you want a version stripped of one or both of those? Could such a thing live in a package?
Is the refactored version not deep-copying the trace?
Yes. I think that's correct, but I'll double check and then maybe give you an argument why.
Is PFRjAsMH
something that's used in practice for inference or is it just for test purposes? The refactored code is perhaps a little stricter about catching unexpected score == -Infinity
and the like, some of which would will need loosening off to make this work.
For teaching maybe you want a version stripped of one or both of those? Could such a thing live in a package?
Yeah, good idea.
Is
PFRjAsMH
something that's used in practice for inference or is it just for test purposes?
Just for testing, for now - but my intuition is that our SMC implementation should work in the extreme case of only one particle.
my intuition is that our SMC implementation should work in the extreme case of only one particle.
In the case where advancing the single particle to the next observation leads to a score of -Inf, I wonder if it would be preferable to keep re-running that step of SMC until we find an extension to the particle/trace with non-zero probability before doing rejuv. Otherwise, the first change rejuv proposes will be accepted, and it seems quite likely it will wipe out all the progress made during this careful initialization.
Progress update: There are only a few things left to do on my todo list. I'm hoping to open a pull request for this issue at the beginning of next week.
I wonder if it would be preferable to keep re-running that step of SMC until we find an extension to the particle/trace with non-zero probability before doing rejuv.
I don't think that's possible - we don't know that there is any local extension with non-zero probability. (Consider the case where no random choices have happened since the last factor.)
Otherwise, the first change rejuv proposes will be accepted, and it seems quite likely it will wipe out all the progress made during this careful initialization.
A big reason for this is that if both old and new score are -Infinity
, we accept the proposal, right? If we didn't do this, then the first proposal we accept would be the first one that changes the score to a number greater than -Infinity
. For models without structural changes, I'd guess that this would commonly be a proposal that changes the most recent random choice (although it would depend a lot on the dependence structure of the model). So, assuming that we still have this behavior in the new implementation, I'd expect that turning it off would help somewhat. I wouldn't optimize for this use case more than that for now.
Progress update: There are only a few things left to do on my todo list. I'm hoping to open a pull request for this issue at the beginning of next week.
Awesome!
Is the refactored version (doing the equivalent of) not deep-copying the trace? If so, we should make sure that that's correct in general.
I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices made at sample statements (e.g. by reaching back into the trace history) nor do we mutate anything that these choice object reference. (With the exception of the store of course, but we take care to clone that where necessary.)
For reference the trace in the current version of ParticleFilterRejuv
looks something like this:
[ { k: <...>, name: <...>, erp: <...>, params: <...>,
score: <...>, forwardChoiceScore: <...>,
val: <...>, reused: <...>, store: <...> }, ... ]
A deep copy happens during resampling and before MH generates a new proposal.
There may have been a reason for using deep copies.
I had a look at the commit history to see if I could find any clues but didn't have much luck. It looks like the initial commit will have been broken as traces will have been shared between particles. The deep copy was added here - ring any bells?
PF with rejuvenation seems to be faster (by a small constant) in the refactored version. I can't yet fully explain why this is the case.
It turns out that this is because the current rejuvenation code doesn't have the optimizations which bail out of the proposal early e.g. when the probability becomes zero.
Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.
I've not done this yet and would prefer to tackle it as a separate task.
The refactored MH doesn't do anything with mh-diagnostics/diagnostics.js
but it does have the justSample
option allowing all samples to be collected. Is that sufficient for now?
I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices...
This is not true for HMC. Computing the derivative of the score will mutate sensitivities on the tape constructed through the trace via re-defined operators.
The deep copy was added here - ring any bells?
I don't remember why exactly this was added. I should have added a note in the commit message or code - sorry! I suspect I was concerned about the store not getting copied wherever it needed to be copied, possibly because I ran into a case where this actually happened (though I also wouldn't be surprised if I just added it preemptively).
Also, instrument inference algorithms with hooks such that we can pass in functions to collect and compute diagnostics without the diagnostics code being part of the core algorithms.
I've not done this yet and would prefer to tackle it as a separate task.
Fine with me.
The refactored MH doesn't do anything with
mh-diagnostics/diagnostics.js
but it does have thejustSample
option allowing all samples to be collected. Is that sufficient for now?
Also fine - we can revisit this when we add hooks.
I've not been able to think of a reason why a deep copy is necessary. The reason I think a shallow copy is sufficient is that I don't think we mutate any of the objects representing the choices...
This is not true for HMC.
If it's only necessary for HMC, we should only deep-copy when we're actually running HMC, so that we don't incur unnecessary cost otherwise.
If it's only necessary for HMC, we should only deep-copy when we're actually running HMC, so that we don't incur unnecessary cost otherwise.
This should be fairly straightforward. You'd only need to check if the score for the trace is a Number
or a Tape
and then clone accordingly.
However, one thing to note is that this does necessitate that the trace is entirely self-contained; i.e, the trace is built without other properties of coroutine/kernel also computing/referring to parts of the tape.
E.g: things like this.currScore
should be avoided since computing gradients mutates them, but cloning trace doesn't consider this.
I believe that this is handled correctly in the refactor, but I haven't checked that thoroughly, so I'm not 100% certain.
Closed by #169.
There is a lot of overlap in pf, pmcmc, smc; and in mh and smc. One way to combat this would be to just have a single smc algorithm, and implement all of the algorithms mentioned above as parameterizations of this uber algorithm. This has some advantages, but doesn't feel entirely right, based on what smc currently looks like; it would be better if we had smaller compositional pieces.