tapios / risk-networks

Code for risk networks: a blend of compartmental models, graphs, data assimilation and semi-supervised learning
Other
2 stars 2 forks source link

EpidemicSimulator faces reproducibility issues again #90

Closed lubo93 closed 4 years ago

lubo93 commented 4 years ago

After merging the new "ContactSimulator", I realized that the reproducibility issues appeared again. I performed some simulations using "simulate_idealized_NYC_epidemic_nointervention.py" and always obtain different trajectories.

@glwagner Any ideas what causes these issues (dictionaries, random access elements, ...)?

I couldn't find any additional libraries in the new contact simulator function that could cause these reproducibility issues. I also checked different "seeding" protocols, but it didn't solve the issue.

glwagner commented 4 years ago

@lubo93 I believe you need to set the number of threads to 1 to obtain reproducible simulations. Let me know if this works for you.

There is no way around this issue, I don't think, at least not within the scope of this project.

glwagner commented 4 years ago

To set the number of threads to 1, write


from numba import set_num_threads
set_num_threads(1)
lubo93 commented 4 years ago

Yes, I set the number of numba threads to 1 and the results were reproducible before the contact simulator update. I think that it has to do some dictionary manipulations.

lubo93 commented 4 years ago

I think that I identified the problem. Commenting out @njit#(parallel=True) leads to reproducible results again. The "parallel = True" part was not present in previous code versions and I suspect that it resets the numbda threads? I will leave it commented out to obtain reproducible simulations.

dburov190 commented 4 years ago

can it also be a new function? see https://github.com/dburov190/risk-networks/pull/88/files#diff-5d8dbfa7620da2b37f7dee6311fb5bbfR65

glwagner commented 4 years ago

To clarify the discussion on this issue --- prior to #88, the ContactSimulator could not be parallelized using numba because all contacts were updated en masse during a single Gillespie simulation of the entire system.

In #88, the contacts are simulated independently. This means that the loop over all contacts is trivially parallelized.

The elements of a loop within numba.prange are not executed in a deterministic order when multithreading is used. This means that the order in which random numbers are generated on contacts is not deterministic when the loop is multithreaded. I thought that using 1 thread, as noted above, would solve this problem, though apparently it did not.

As noted in #97, and at the time of this writing, multithreading the loop over contacts yielded no advantage in run time for EpidemicSimulator even for large problems, perhaps because ContactSimulator.run walltime is dominated by preprocessing tasks, rather than the generation of stochastic contacts. So there's no point in using multithreading right now for this and we might as get rid of it.