Closed JDTruj2018 closed 8 months ago
I am trying to think of a scenario where we would not want this to be the default, but maybe @plavin can also weigh in.
My thought is that we would want this enabled by default unless we were explicitly trying to find issues with how a memory system orders writes at the HW level.
Add atomic operations for scatter kernels for the OpenMP and CUDA backends to prevent race conditions and properly order writes.
Should this be a command line option that can be toggled? Or just the default/only option?