tapios / risk-networks

Code for risk networks: a blend of compartmental models, graphs, data assimilation and semi-supervised learning
Other
2 stars 2 forks source link

Utils for setting numba random seed #71

Closed glwagner closed 4 years ago

glwagner commented 4 years ago

This PR adds some example scripts to /sandbox that (seem to) demonstrate how to set the random seed that is used internally by numba.

It seems more or less as "simple" as wrapping numpy.random.seed in a numba function. Due to the simplicity of this procedure I think we should get in the habit of defining this function at the top of scripts where reproducible behavior is required (rather than writing this kind of function into the source).

Running set_numba_seed.py produces, for me:

(risknet) ~/Projects/risk-networks/sandbox$ python3 set_numba_seed.py 
Random numbers, no seeding:
[7. 1. 6. 8.]
[1. 9. 1. 1.]

Random numbers, seeding, serial execution:
[2. 2. 6. 1.]
[2. 2. 6. 1.]

Random numbers, single-thread seeding, multithreaded execution with 8 threads
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]

Random numbers, multithreaded seeding (?), multithreaded execution with 8 threads
[2. 2. 6. 1. 6. 5. 2. 3.]
[2. 2. 0. 4. 9. 6. 6. 1.]
[2. 2. 9. 7. 5. 4. 7. 6.]
[2. 2. 6. 1. 1. 3. 3. 9.]
[2. 2. 6. 1. 5. 5. 3. 9.]
[2. 2. 9. 4. 8. 5. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 6. 2. 7. 4.]
[2. 2. 6. 1. 5. 3. 4. 8.]
[2. 2. 6. 1. 3. 9. 6. 1.]

Random numbers, seeding, multithreaded functions with 1 thread.
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]
[2. 2. 6. 1. 3. 9. 6. 1.]

It appears that setting the seed on a single thread leads to reproducible behavior --- maybe --- when running multithreaded. I'm a little skeptical; I think more research is needed.

Hopefully this at least partially addresses, if not resolves #70 .

lubo93 commented 4 years ago

Ok, it would be good to add these seed functions to "ContactSimulator". It hopefully resolves the reproducibility issues. A simple test with "super_simple_epidemic.py" should suffice.

glwagner commented 4 years ago

Ok, it would be good to add these seed functions to "ContactSimulator".

That's fair. We can add a method set_seed.

Seems we need to overhaul ContactSimulator anyways. I'll implement this change in that PR.

glwagner commented 4 years ago

Ok, I've added a function called

epiforecast.utilities.seed_three_random_states(seed)

that seeds np.random, random, and the numba random state.

I've also included an example that illustrate what happens when you use it in examples/seeding_random_states.py. It produces this output

With seed 123
           random.random: 0.052363598850944326
     numpy.random.random: 0.6964691855978616
                   numba: 0.6964691855978616

With no seed
           random.random: 0.08718667752263232
     numpy.random.random: 0.28613933495037946
                   numba: 0.28613933495037946

With seed 123 again
           random.random: 0.052363598850944326
     numpy.random.random: 0.6964691855978616
                   numba: 0.6964691855978616

Notice that the default numba random number generator is designed to produce identical results as the numpy random number generator, per https://github.com/numba/numba/pull/3038.

Finally, this will only produce deterministic output in single-threaded execution. To guarantee this, write

from numba import set_num_threads

set_num_threads(1)

@lubo93 please merge if satisfactory.

lubo93 commented 4 years ago

Ok, thanks! You mentioned that this solution only works for a single thread. Is this sufficient for the ContactSimulator to be efficient enough for large networks?