Closed weshinsley closed 4 years ago
Tested in Intel Inspector with... /c:8 /NR:10 /PP:data\TEST\preGB_R0=2.0new.txt /P:data\TEST\p_PC_CI_HQ_SD.txt /O:Output\PC_CI_HQ_SD_100_91_R0=2.2 /M:Output\TEST.bin /D:data\TEST\GB_pop2018.bin /L:data\TEST\NetworkGB_8T.bin /CLP1:100 /CLP2:91 /CLP3:121 /CLP4:121 /R:1.1 98798150 729101 17389101 4797132
and all clear in terms of threading issues. Checking no reproducibility breakages next...
I have confirmed that all the report 9 runs produced with this branch are byte-identical to runs produced with the current master.
This means two good things a) This branch is looking good, and b) The reported race in Intel Inspector never made any difference to results - as we already expected.
@igfoo and @NeilFerguson - would especially appreciate you looking at my code changes - it's a fairly small change.
Two particular things I coded "naively" -
I may be over-generous on SetupModel.cpp:1457 in the amount of space I am allocating to the buffer.
I may also be calling the flusher (UpdateHostClosure) in more places than I strictly need to; I suspect once per timestep might be sufficient, whereas I am calling at the end of each parallel section in which changes might have been made. However, calling UpdateHostClosure when the buffers are empty will take a negligible amount of time, so I went for "better safe than sorry".
Repeated all tests for report 9 with these changes, and they are still 100% identical.
Just want to do final performance test on whether parallelising UpdateHostClosure actually helps - hard to see which job was which post-run.
Parallelisation actually slowed it down! Not enough work to overcome the fork overheads when using the modulo approach. Reverted to single-threaded UpdateHouseClosure, and retested against full Report 9 results.
@igfoo - Thanks for comments - all addressed. @NeilFerguson if you have time for a quick look at this, just to check there is nothing I've missed; I think it's all fine, is replicating all the results perfectly, and still has no warnings in Intel...
After discussion, going to do one more test with DCT switched on (which is not among the Report 9 tests), since there is a call to place closure in that part.
Results using p_PC7_CI_HQ_SD_DCT.txt were identical when run with the latest commit on this branch, as with commit dfc97f5, which was just before I started the branch...
So I think we've done as much due diligence as possible, and we're good to go.
Intel Inspector reported a data race on the middle two if statements Update.cpp:1229, 1230 on current master, in threaded
DoPlaceClose
)- data races on which thread gets to those particular lines first; no other lines involved.The reason is a rare case where two schools close, and a child from each school belong to the same household, and each thread wants to update the absentee timings simultaneously. As the two instructions are effectively max/min, the results will not be affected by this change, and neither is master non-deterministic. Yet, it is probably helpful to remove the warning from Intel Inspector and thread this anyway.
I have buffered these changes in a threaded-array, see
host_closure_queue
andhost_closure_queue_size
in Model.h, which are put inStateT[tn]
. I then replace the above lines inDoPlaceClosure
with thread-safe lines, and I call a new functionUpdateHostClosure()
to flush those changes, at the end of each surrounding thread-loop whereDoPlaceClosure
might be called.I believe since t_start and t_stop are a few timesteps in the future compared to the current timestep, the effects of the change do not need to be in place immediately (and we would see more warnings from Intel Inspector if that were necessary I think).