icecube / pisa

Monte Carlo-based data analysis
http://icecube.github.io/pisa/
Apache License 2.0
19 stars 47 forks source link

Deprecate `jumpahead` in `get_random_state` #711

Closed atrettin closed 1 year ago

atrettin commented 1 year ago

The jumpahead argument in pisa.utils.random_numbers.get_random_state does not result in independent sequences of output numbers if the random state is used to generate random numbers. It doesn't even reliably move the sequence of output numbers ahead by the given number, because calling rand does not mean that the internal Mersenne twister is really moved by one step (and really, we don't have access to numpy internals and shouldn't make such assumptions anyway).

A function that minimally implements the behavior of the PISA method is:

def get_poisson_sequence(size, seed=0, jumpahead=0):
    rs = np.random.RandomState(seed=seed)
    rs.rand(jumpahead)

    return rs.poisson(size=size)

With the right settings for jumpahead, we can repeat sequences exactly after a few initially different numbers:

get_poisson_sequence(15, seed=0, jumpahead=1)
>>> array([2, 1, 3, 2, 1, 0, 0, 5, 1, 1, 2, 0, 1, 1, 2])
get_poisson_sequence(15, seed=0, jumpahead=3)
>>> array([1, 1, 2, 2, 1, 0, 0, 5, 1, 1, 2, 0, 1, 1, 2])

We should deprecate jumpahead, as people can be mislead and use it to generate trials that they think are independent. My suggestion is to raise an Exception (no one cares about warnings) with instructions about how to fix the problem (by using a different seed instead of jumpahead).

philippeller commented 1 year ago

good catch! Let's do what you suggest and raise an exception if anyone uses this feature and disable it

LeanderFischer commented 1 year ago

This hasn't been implemented, right? edit: no and once it gets implemented, the unit test should probably check for it.

atrettin commented 1 year ago

Yeah this can trip people up when they make pseudo-data trials and they wonder why their trials don't look so independent. If this is still a thing then yes it should definitely get fixed! 😅

LeanderFischer commented 1 year ago

Interestingly, the OG version of the unit test, using jumpahead worked :thinking:

atrettin commented 1 year ago

It doesn't always fail to produce independent series. That's the terrible thing about it. :D