lanl / PENNANT

Unstructured mesh hydrodynamics for advanced architectures
Other
20 stars 30 forks source link

Potential Data Race #7

Closed BradSwain closed 3 years ago

BradSwain commented 3 years ago

We are developing a static race detection tool and it has detected a potential data race in PENNANT. The issue does not appear to be severe, but we thought it best to report anyway, just in case.

The data race occurs on the dtrec variable between the write at Hydro.cc:601 and the read at Hydro.cc:596

https://github.com/lanl/PENNANT/blob/d1fd8cae611e37c492b5c3bbd18f2885531060b7/src/Hydro.cc#L601-L596

 if (dtchunk < dtrec) {
        #pragma omp critical
        {
            // redundant test needed to avoid race condition
            if (dtchunk < dtrec) {
                dtrec = dtchunk;
                strncpy(msgdtrec, msgdtchunk, 80);
            }
        }
    }

Although the write is guarded by a critical section, the read at line 596 is not guarded, and so there is a data race between these two accesses.

I have pasted the full report from our tool below.

==== Found a race between: 
line 601, column 17 in src/Hydro.cc AND line 596, column 19 in src/Hydro.cc
Shared variable: 
 at line 68 of src/Driver.cc
 68|    hydro = new Hydro(inp, mesh);
shared field: ->dtrec
Thread 1: 
 599|            // redundant test needed to avoid race condition
 600|            if (dtchunk < dtrec) {
>601|                dtrec = dtchunk;
 602|                strncpy(msgdtrec, msgdtchunk, 80);
 603|            }
>>>Stack Trace:
>>>.omp_outlined._debug__.20 [src/Hydro.cc:292]
>>>  Hydro::calcDtHydro(double const*, double const*, double const*, double, int, int) [src/Hydro.cc:304]
Thread 2: 
 594|    calcDtVolume(zvol, zvol0, dtlast, dtchunk, msgdtchunk,
 595|            zfirst, zlast);
>596|    if (dtchunk < dtrec) {
 597|        #pragma omp critical
 598|        {
>>>Stack Trace:
>>>.omp_outlined._debug__.20 [src/Hydro.cc:292]
>>>  Hydro::calcDtHydro(double const*, double const*, double const*, double, int, int) [src/Hydro.cc:304]
The OpenMP region this bug occurs:
/home/brad/tmp/PENNANT/src/Hydro.cc
>291|    #pragma omp parallel for schedule(static)
 292|    for (int zch = 0; zch < mesh->numzch; ++zch) {
 293|        int zfirst = mesh->zchzfirst[zch];
 294|        int zlast = mesh->zchzlast[zch];
 295|
 296|        // 7a. compute work rate
Gets called from:
>>>main
>>>  Driver::run() [src/main.cc:45]
>>>    Hydro::doCycle(double) [src/Driver.cc:106]
cferenba commented 3 years ago

Yes, there is technically a data race here, but you can reason that it doesn't affect correctness.

Note that the logic inside the critical region can cause dtrec to decrease, but never to increase. So you could have a case where thread A passes the outer if-test, thread B decreases dtrec, and thread A later fails the inner if-test. That still gives correct behavior, since it's the inner test that matters. But the reverse can never happen: if thread A fails the outer if-test, dtchunk >= dtrec, and that continues to be true even if thread B decreases dtrec. So thread A could never have passed the inner if-test, regardless of what other threads did.

Does that make sense?

BradSwain commented 3 years ago

Sorry for the late reply, and thank you for the explanation. That does make sense to me. We figured we should report this race to be safe, but it does not appear to affect correctness.