Closed lubo93 closed 4 years ago
@lubo93 I believe you need to set the number of threads to 1 to obtain reproducible simulations. Let me know if this works for you.
There is no way around this issue, I don't think, at least not within the scope of this project.
To set the number of threads to 1, write
from numba import set_num_threads
set_num_threads(1)
Yes, I set the number of numba threads to 1 and the results were reproducible before the contact simulator update. I think that it has to do some dictionary manipulations.
I think that I identified the problem. Commenting out @njit#(parallel=True)
leads to reproducible results again. The "parallel = True" part was not present in previous code versions and I suspect that it resets the numbda threads? I will leave it commented out to obtain reproducible simulations.
can it also be a new function? see https://github.com/dburov190/risk-networks/pull/88/files#diff-5d8dbfa7620da2b37f7dee6311fb5bbfR65
To clarify the discussion on this issue --- prior to #88, the ContactSimulator
could not be parallelized using numba because all contacts were updated en masse during a single Gillespie simulation of the entire system.
In #88, the contacts are simulated independently. This means that the loop over all contacts is trivially parallelized.
The elements of a loop within numba.prange
are not executed in a deterministic order when multithreading is used. This means that the order in which random numbers are generated on contacts is not deterministic when the loop is multithreaded. I thought that using 1 thread, as noted above, would solve this problem, though apparently it did not.
As noted in #97, and at the time of this writing, multithreading the loop over contacts yielded no advantage in run time for EpidemicSimulator
even for large problems, perhaps because ContactSimulator.run
walltime is dominated by preprocessing tasks, rather than the generation of stochastic contacts. So there's no point in using multithreading right now for this and we might as get rid of it.
After merging the new "ContactSimulator", I realized that the reproducibility issues appeared again. I performed some simulations using "simulate_idealized_NYC_epidemic_nointervention.py" and always obtain different trajectories.
@glwagner Any ideas what causes these issues (dictionaries, random access elements, ...)?
I couldn't find any additional libraries in the new contact simulator function that could cause these reproducibility issues. I also checked different "seeding" protocols, but it didn't solve the issue.