mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

Buffer changes to host.absent_start|stop_time, and commit single-threaded #375

Closed weshinsley closed 4 years ago

weshinsley commented 4 years ago

Intel Inspector reported a data race on the middle two if statements Update.cpp:1229, 1230 on current master, in threaded DoPlaceClose)- data races on which thread gets to those particular lines first; no other lines involved.

if ((HOST_AGE_YEAR(l) >= P.CaseAbsentChildAgeCutoff) && (abs(Hosts[l].inf) != InfStat_Dead))
{
  if (Hosts[l].absent_start_time > t_start) Hosts[l].absent_start_time = t_start;
  if (Hosts[l].absent_stop_time < t_stop) Hosts[l].absent_stop_time = t_stop;
  StateT[tn].cumAPA++;
  f = 1;
}

The reason is a rare case where two schools close, and a child from each school belong to the same household, and each thread wants to update the absentee timings simultaneously. As the two instructions are effectively max/min, the results will not be affected by this change, and neither is master non-deterministic. Yet, it is probably helpful to remove the warning from Intel Inspector and thread this anyway.

I have buffered these changes in a threaded-array, see host_closure_queue and host_closure_queue_size in Model.h, which are put in StateT[tn]. I then replace the above lines in DoPlaceClosure with thread-safe lines, and I call a new function UpdateHostClosure() to flush those changes, at the end of each surrounding thread-loop where DoPlaceClosure might be called.

I believe since t_start and t_stop are a few timesteps in the future compared to the current timestep, the effects of the change do not need to be in place immediately (and we would see more warnings from Intel Inspector if that were necessary I think).

weshinsley commented 4 years ago

Tested in Intel Inspector with... /c:8 /NR:10 /PP:data\TEST\preGB_R0=2.0new.txt /P:data\TEST\p_PC_CI_HQ_SD.txt /O:Output\PC_CI_HQ_SD_100_91_R0=2.2 /M:Output\TEST.bin /D:data\TEST\GB_pop2018.bin /L:data\TEST\NetworkGB_8T.bin /CLP1:100 /CLP2:91 /CLP3:121 /CLP4:121 /R:1.1 98798150 729101 17389101 4797132

and all clear in terms of threading issues. Checking no reproducibility breakages next...

weshinsley commented 4 years ago

I have confirmed that all the report 9 runs produced with this branch are byte-identical to runs produced with the current master.

This means two good things a) This branch is looking good, and b) The reported race in Intel Inspector never made any difference to results - as we already expected.

weshinsley commented 4 years ago

@igfoo and @NeilFerguson - would especially appreciate you looking at my code changes - it's a fairly small change.

Two particular things I coded "naively" -

  1. I may be over-generous on SetupModel.cpp:1457 in the amount of space I am allocating to the buffer.

  2. I may also be calling the flusher (UpdateHostClosure) in more places than I strictly need to; I suspect once per timestep might be sufficient, whereas I am calling at the end of each parallel section in which changes might have been made. However, calling UpdateHostClosure when the buffers are empty will take a negligible amount of time, so I went for "better safe than sorry".

weshinsley commented 4 years ago

Repeated all tests for report 9 with these changes, and they are still 100% identical.

Just want to do final performance test on whether parallelising UpdateHostClosure actually helps - hard to see which job was which post-run.

weshinsley commented 4 years ago

Parallelisation actually slowed it down! Not enough work to overcome the fork overheads when using the modulo approach. Reverted to single-threaded UpdateHouseClosure, and retested against full Report 9 results.

weshinsley commented 4 years ago

@igfoo - Thanks for comments - all addressed. @NeilFerguson if you have time for a quick look at this, just to check there is nothing I've missed; I think it's all fine, is replicating all the results perfectly, and still has no warnings in Intel...

weshinsley commented 4 years ago

After discussion, going to do one more test with DCT switched on (which is not among the Report 9 tests), since there is a call to place closure in that part.

weshinsley commented 4 years ago

Results using p_PC7_CI_HQ_SD_DCT.txt were identical when run with the latest commit on this branch, as with commit dfc97f5, which was just before I started the branch...

So I think we've done as much due diligence as possible, and we're good to go.