I believe the OpenCL version of the top_scan kernel in the sort benchmark has a data race. On line 127 of src/opencl/level1/sort/sort.cl the __local s_seed variable is read by all threads with get_local_id(0) < n and on line 132 the variable is written to by thread get_local_id(0) == n - 1, while there is no barrier in between the statements.
I believe the OpenCL version of the
top_scan
kernel in thesort
benchmark has a data race. On line 127 ofsrc/opencl/level1/sort/sort.cl
the__local s_seed
variable is read by all threads withget_local_id(0) < n
and on line 132 the variable is written to by threadget_local_id(0) == n - 1
, while there is no barrier in between the statements.