nchopin / particles

Sequential Monte Carlo in python
MIT License
403 stars 75 forks source link

Fixing seeds and reproducibility #87

Open nchopin opened 5 months ago

nchopin commented 5 months ago

Discussed in https://github.com/nchopin/particles/discussions/86

Originally posted by **MauroCE** May 4, 2024 I really like this package and it's very intuitive to use :) One thing that would make me love it even more would be to allow the user to provide/fix the random state so that results can be exactly reproduced. I am still getting familiar with the abstractions and inner workings of the package, but I was thinking something like this could work: 1. Defining a function in `utils.py` that generates an instance of `np.random.Generator` given either a seed or another rng ``` def setup_rng(seed, rng): assert (seed is None) or (rng is None), "At least one of `seed` or `rng` must be None." if rng is None: if seed is None: seed = np.random.randint(low=1000, high=9999) rng = np.random.default_rng(seed=seed) return rng ``` 2. Add reproducibility for resampling schemes by modyfing `rs_doc`, `resampling_scheme`, `resampling`. Mostly this requires adding an extra argument `rng=None` to the various resampling functions (as well as `uniform_spacings` and the queue). E.g. ``` from particles.utils import setup_rng rs_doc = """\ Parameters ---------- W : (N,) ndarray normalized weights (>=0, sum to one) M : int, optional (set to N if missing) number of resampled points. rng : np.random.Generator, optional (instantiated at random if missing) random number generator for reproducibility Returns ------- (M,) ndarray M ancestor variables, drawn from range 0, ..., N-1 """ def resampling_scheme(func): """Decorator for resampling schemes.""" @functools.wraps(func) def modif_func(W, M=None, rng=None): M = W.shape[0] if M is None else M return func(W, M, rng=rng) rs_funcs[func.__name__] = modif_func modif_func.__doc__ = func.__doc__ + rs_doc return modif_func def resampling(scheme, W, M=None, rng=None): rng = setup_rng(seed=None, rng=rng) try: return rs_funcs[scheme](W, M=M, rng=rng) except KeyError: raise ValueError(f"{scheme} is not a valid resampling scheme") @resampling_scheme def systematic(W, M, rng=None): """Systematic resampling.""" su = (rng.uniform(low=0.0, high=1, size=1) + np.arange(M)) / M return inverse_cdf(su, W) ``` 3. Add `rng=None` to the arguments of `ArrayMCMC` and store it as an attribute. Then `ArrayMetropolis`, `ArrayRandomWalk` etc can use it within the `step()` and `proposal` methods respectively. When one instantiates a `MCMCSequence` one would pass an optional `rng`. In the same way, `rng` is passed down from the `FKSMCsampler`, which gets handed it down from e.g. `Tempering` or `AdaptiveTempering`. The same thing can be done for `FeynmanKac` and `SMC`. I tried it on my machine and it seems to work smoothly. One could probably work directly with `rng` and the argument `seed` in `setup_rng` would be redundant. I find that, depending on what I am trying to do and compare results with, I sometimes need to pass a seed or an rng, so could be an option to leave both. Let me know what you think :)