igarnier / prbnmcn-dagger

A library for probabilistic programming
https://igarnier.github.io/prbnmcn-dagger/
Other
14 stars 3 forks source link

smc_inference documentation #2

Closed nilsbecker closed 1 year ago

nilsbecker commented 1 year ago

for me, the documentation for smc_inference is not sufficient. i have trouble understanding what the module does exactly.

what i would expect for sequential monte carlo is to first set the population size N of concurrent particles/walkers/clones/copies . then, i would expect to get a way to stream samples, where each iteration of the stream would yield a weighted population of samples of size N; resampling would happen automatically based on some criterion for effective sample size of the population.

looking at e.g. Non_interruptible.run, i can set a number of particles, so that's fine. but then i get back a weighted list of walkers. is this just one step in the stream, with a single run of the model for each particle? i do not understand if and when resampling is triggered in the single step that was run. also, i don't see how to proceed to the next step.

the basic example biased_coin.ml is of little help as it appears to run just one step and give back a list of samples with trivial weights.

trying to understand the source, i halfway understand that there is a bunch of effects that can be triggered, which may lead to particles being committed to a list for resampling. but i don't get how this combines with the run function. do i need to write a model that generates an infinite stream of samples?

is there a more complete example of the usage of smc_inference available somewhere?

thanks!

igarnier commented 1 year ago

Hi Nils, thanks for reporting this issue. I agree that the API is a bit clunky. You can find a more substantial example in examples/wind_power_forecasting and I wrote a blog article which may be helpful (it's a bit too verbose probably) http://probanomicon.xyz/blog/wind_power_forecast.html#the-model Basically, I record the population at each resampling in a table and I proceed to exploting the sequence of populations in a second step.

Note that currently, resampling is not performed based on the ESS criterion: it is performed systematically when all particles have yielded. This is something that I should fix as well.

nilsbecker commented 1 year ago

ok, thanks that helped.

iiuc a model to be used within smc needs to consume a stream of input and yield after each step; then the run wrapper will manage a population of such models including resampling. this is different from the lmh api, where a model just does a single evaluation.

looks reasonable once it's clear that a 'model' in the two implementations needs to do different things. to document the latter fact, a minimal nontrivial example in the docs may be helpful.

igarnier commented 1 year ago

Glad it helped. You're right that documentation can be improved. I also tried to come up with another API. It's simpler and perhaps closer to what you initially wanted, at the cost of a functor application. Feel free to have a look and give feedback if you have time.

https://gitlab.com/igarnier/monorepo/-/blob/master/dagger/src/smc_inference.mli

nilsbecker commented 1 year ago

thanks! i'll have a look at the alternative API. in the meantime, some feedback on the docs.

one thing that was not clear to me initially was that after a resampling, all of the weights are in fact always equal to 1 -- this is not strictly required i think, one could also think of resampling schemes that make weights only approximately equal. so that could be stated explicitly

https://github.com/igarnier/prbnmcn-dagger/blob/360e196be3e28cbcc67691a1fd68f1cd93743e35/src/resampling.mli#L23 this scheme appears to make sense only if the weights are already normalized to sum to 1. in the implementation, the returned weights sum to the number of particles as far as i could see.

also, the U_i are N-1 in total, so the sum over rk is only N-1 which is probably not intended? (unless i'm mistaken). nitpicking, to avoid double counting in edge cases, the intervals [W{k-1}, W_k] should probably be half-open.

https://github.com/igarnier/prbnmcn-dagger/blob/360e196be3e28cbcc67691a1fd68f1cd93743e35/src/resampling.mli#L36 in this line, intervals are discussed but for the following resampling rule, it seems only the points (i-1)/N + U_i are of interest, not those intervals.

nilsbecker commented 1 year ago

https://gitlab.com/igarnier/monorepo/-/blob/master/dagger/src/smc_inference.mli#L29 that documentation i find nice and clear.

iiuc, that signature would no longer require each particle to record its trace but instead give back a list of particles existing before or after each resampling event. that's closer to what i would have expected initially. however it possibly makes it harder to record a trace for each particle in case that is of interest?

i'm not sure i understand what the role of Observable is here.

i like the absence of the *_noyield functions which seemed cumbersome. often it's better to say what you do than what you don't.

igarnier commented 1 year ago

one thing that was not clear to me initially was that after a resampling, all of the weights are in fact always equal to 1 -- this is not strictly required i think, one could also think of resampling schemes that make weights only approximately equal. so that could be stated explicitly

I think this is a property that should not be relied on (I do use that property in the examples but it's a mistake that I should fix). Do you have a resampling scheme in mind?

this scheme appears to make sense only if the weights are already normalized to sum to 1. in the implementation, the returned weights sum to the number of particles as far as i could see. [...]

I'll create a dedicated issue to review the documentation and the implementation of the resampling module.

iiuc, that signature would no longer require each particle to record its trace but instead give back a list of particles existing before or after each resampling event. that's closer to what i would have expected initially.

Yes, no need to manually store each population. The run function directly produces a sequence of populations.

however it possibly makes it harder to record a trace for each particle in case that is of interest?

That's the purpose of the new signature for yield, which takes as argument some data of type Observable.t. The name Observable is badly chosen, perhaps Particle_output would be clearer.

i like the absence of the *_noyield functions which seemed cumbersome. often it's better to say what you do than what you don't.

I think it's better as well. In the implementation corresponding to this signature, the score function do not yield anymore, one has to yield explicitly. As a side historical note, the original behaviour came from this article https://dl.acm.org/doi/pdf/10.1145/3236778 page 11, where scoring always suspends a particle.

Thanks a lot for your comments!

nilsbecker commented 1 year ago

The name Observable is badly chosen, perhaps Particle_output would be clearer.

i think observable could be fine. e.g. in physics an observable is any property of a system that can be measured. some short explanation in the docs may be sufficient.

Do you have a resampling scheme in mind?

not anything specific that i could point to. i just remember from the literature of enhanced sampling methods e.g. in chemical physics, that people used different home-cooked strategies, such as, terminating or splitting each particle stochastically with probabilities determined by its current weight. here each particle weight is individually conserved on average but weights are not equal after a resampling. also (i think) resampling does not need to synchronize the population in such a scheme.

anyway, in case that's interesting to you:

what i'm currently interested in is model evidence estimation in the context of ABC (approximate bayesian computation), e.g. doi:10.1214/11-BA602 . there one runs a population of MC chains that have proposal steps from the prior of the parameters but are conditioned on a likelihood function. that likelihood function is not a constant. it starts from flat but becomes successively more focused as a parameter is updated in several steps during the sampling run. at the end, the likelihood basically amounts to a soft dirac-delta function around all parameters that are compatible with the ABC acceptance criterion. during the process one has to resample the population, basically in sync with the likelihood focusing steps. at variance with the typical scheme assumed (i think) also in prbnmc-dagger, the likelihood steps change the total weight of the population. in order to get the desired model evidence in the end, one has to track the change in total weight of the population during the process.

i don't know yet if i will be able to implement such a scheme within prbnmc-dagger. it sounds doable but i will probably have to supply my own resampling scheme in order to get access to the internals of weight updates.

igarnier commented 1 year ago

Thanks for the reference, I'll try to have a look. I'd like to find a nice API for specifying custom resampling.

nilsbecker commented 1 year ago

In terms of literature on model evidence calculation, there is also a recent comprehensie review . i found it a bit cumbersome to read. but it does cover a wide range of monte carlo algorithms that cover resampling in some way or other.

igarnier commented 1 year ago

Thanks for this reference! I somewhat arbitrarily picked the algorithm described in "Adaptive approximate Bayesian computation" by Beaumont et al, which proposes a correction to https://doi.org/10.1073/pnas.0607208104. In order to implement it, I had to perform a substantial refactor of the resampling and SMC API, allowing users to program custom resampling steps. See my message here https://github.com/igarnier/prbnmcn-dagger/issues/3 for a draft of the API and the associated doc.

I managed to reproduce the first example in the paper (didn't try the other for time reasons). You can have a look here: https://gitlab.com/igarnier/monorepo/-/blob/master/dagger/examples/smc-abc/abc.ml Most of the complexity ends up in the resampling function.