sandialabs / qthreads

Lightweight locality-aware user-level threading runtime.
https://www.sandia.gov/qthreads/
Other
173 stars 35 forks source link

Race Conditions and Deadlock in `qarray` Test With Thread Sanitizer #240

Closed insertinterestingnamehere closed 2 weeks ago

insertinterestingnamehere commented 8 months ago

The qarray test hangs with thread sanitizer (x86-64, nemesis, clang17, no topology detection).

Prior to hanging though, it also emits various thread sanitizer errors:

atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L738 non-atomic read of same variable: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L1033

In the test itself, non-atomic read: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L110 atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L28

Similar: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L124 https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L27

Similar: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L90 https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L20

Somewhat similar: Non-atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L104 Atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/test/features/qarray.c#L21

Non-atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L545 Non-atomic read: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L706 Note the read occurs inside the called function qarray_elem_nomigrate at: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/include/qthread/qarray.h#L114

Non-atomic write: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L982 Atomic write to same address: https://github.com/sandialabs/qthreads/blob/619afe98edce0098f656c32acf09a12789becad0/src/ds/qarray.c#L738

insertinterestingnamehere commented 8 months ago

Debugged some more. It's not actually a deadlock. It's just that that particular test hits the thread sanitizer performance penalty really hard. Adjusting the problem size gets it down to a reasonable runtime.

insertinterestingnamehere commented 2 weeks ago

Closing in favor of https://github.com/sandialabs/qthreads/issues/303