Open collinskatie opened 4 years ago
If we @trace the noise, does this mean that we it will automatically be optimized over?
No, the value should stay the same unless you apply an inference operator that modifies it, or unless it gets removed from the trace due to stochastic control flow changes (which also need to be written by the user explicitly).
if there is noise, then the involution round trip fails or the score for a single trace changes each time since the noise is randomized per run of the forward model.
If you have "untraced" a.k.a. "non-addressable" randomness in your model (random choices that don't have an address, but are e.g. within a simulator) then the weight-equality round-trip check in the current head of Gen's master branch will fail. I am 95% sure the kernel is still valid (stationary) even in this setting provided the other checks pass; and I am planning to remove this weight-equality check after double-checking this.
My guess is that not tracing the collision noise could be the right thing to do in this case especially if the collision noise is relatively small. Manually proposing good values to the noise within a reversible jump add-remove move sounds pretty complicated (but not impossible in principle).
I pushed a branch that comments out the weight equality check. You if you open the Julia package manager and use add Gen#20200815-marcoct-removeweightcheck
you should get it.
You can debug this part of the algorithm by setting the noise to very low or near zero, and then seeing how behavior changes as you increase the noise. The remaining involution checks not including the weight-equality check should all still pass -- if they don't then there's a bug somewhere.
Thank you so much, @marcoct !! The package seems to be working, and I am not getting error messages. In addition to debugging by varying the noise values (currently doing this), are there any other steps you recommend to ensure that the kernel is still valid? Thanks!!
@collinskatie You can manually construct a ground truth trace that contains latent variables and simulated data, and then run your kernel from that state. If it is not doing what's expected there (staying near the mode, most likely) then there's an issue.
To test for mathematical correctness of the kernel (invariance / stationarity) you can do somewhat more comprehensive testing like simulating a bunch of synthetic data sets the prior, running inference on each of them, and then aggregating all the latent samples and checking that the distribution on latents is similar to the distribution on latents (this is sometimes called 'simulation based calibration'). But for this intuitive domain I would probably not worry about doing that until the algorithm is very mature -- qualitative checks that it matches your intuition seem highest value.
For your application, since it's an intuitive domain, I'd think from the perspective of building a conceptual model of what the posterior should look like for different simple test cases. If you're debugging the MCMC code you also might want to separate two separate tasks: (i) is the algorithm able to evolve the latent state to near a mode of the posterior, and (ii) is the algorithm able to switch between modes of the posterior. I'd worry about getting (i) working first before worrying as much about (ii) -- it might be possible to capture the qualitative existence of multiple modes just by initializing randomly from a broad distribution; and if you're embedding the MCMC code into an SMC algorithm with multiple particles them the particle weighting can make the approximation more quantitatively correct even if your rejuvenation kernels don't do a good job at (ii).
Also, you probably did this first a while ago, but I definitely recommend starting by making sure the posterior matches your intuitions for a very low dimensional test case and using Gen.importance_sampling. I'd expand the complexity of the test case and the algorithms incrementally. In your application I might start with 2 latent variables for the initial velocity of one object, check that importance sampling with 1000 particles works, then increment to e.g. two objects and run IS with 100k particles. Just to get a good handle for how the posterior of the model behaves in small cases.
The amount of noise you put into the model and the amount of observation noise you use are also key parameters to manipulate when debugging. For example, decreasing dynamics noise should make the inference problem easier, so you can start with low noise and make sure things work as expected there (assuming the noise is untraced). Decreasing the observation noise makes the inference problem harder generally because the posterior becomes more bumpy. So you might want to get the algorithm working reliably first with the model observation noise set high and the dynamics noise set low -- using this as a setting to more easily debug, before decreasing observation noise and increase dynamics noise.
More generally, I recommend debugging the inference code by making a suite of some minimal test cases, kind of like unit tests for regular code, where you control as many of the variables as possible, and check that individual MCMC kernels are doing what they're designed to do. For example, manually constructing a trace certain values for the latent variables and then checking what happens when you apply each kernel to that state for a single transition (you can run the same kernel many times from the same initial state). Check that acceptance rates are what you'd expect. I don't personally think that having the tests fully automated with e.g. thresholds for passing is essential -- in my experience the biggest marginal value to unit testing inference code for a project like this is just in providing some staging and structure to the debugging process.
Also, I think it can be helpful to clearly distinguish between potential bugs in the logic and the need for better-tuned parameters. You can side-step parameter tuning in the model a bit by testing everything on data that's simulated from the model to start with. I think it only makes sense to debug and tune it for real-world data after the inference code works reliably on synthetic data, so you know the general outline of the algorithm is capable at least in theory of solving the problem (e.g. you aren't missing some key kernel).
Anyways - you probably already are doing a lot of these things, but I thought I'd give my two cents.
Thank you so much @marcoct for your incredibly detailed suggestions!! We'll go through those. Thank you SO much!
Hi,
I have two questions about handling noise during inference in Gen.
1) Is there a way to add noise into the generative model (for instance, into the post-collision velocity angle in a physics simulation) without optimizing over this noise? If we @trace the noise, does this mean that we it will automatically be optimized over? Is there a way to @trace something like collision noise that is sampled from a normal distribution without trying to do inference over this noise, but still enabling us to know what noise was sampled per run?
2) Is there a way to deal with noise in reversible jump MCMC? For instance, following the example from (https://github.com/probcomp/Gen.jl/blob/master/examples/kernel_dsl.jl), if we are comparing a trace where we add a block vs. remove a block, if there is noise, then the involution round trip fails or the score for a single trace changes each time since the noise is randomized per run of the forward model. Is there a way to still have a reversible kernel with noise (i.e., noise in the underlying physics trajectory) yet to not actually optimize over this noise? Or does the noise need to be fixed between two diff simulation runs when working w/ reversible kernels?
Thanks for any help!