Closed web3hulei closed 11 months ago
Thanks! I have to ask, did you spot it by eyeballing or is there some tool that spots potential race conditions? If the former, I have to praise the keenness, and if the latter, I'd like to know which one is it:-) Thanks again!
I spotted it by eyeballing
In addition, to correct my previous statement, in fact, on the Ampere architecture, warps in the same block (1024 threads) may not be scheduled by the same warp scheduler, so the scheduling strategy of the warp scheduler may not necessarily guarantee that warps (id>0) will update the counters after warps (id=0) read them.
There is a race condition between lines 267 and 278 in the msm/sort.cuh file. If a warp with a larger warpid executes line 278 before the warp with warpid=0 executes line 267, the calculation result will be wrong. In fact, on the ampere architecture, due to the scheduling strategy of the warp scheduler, it is impossible for warp (id>0) to execute line 278 before warp(id=0) executing line 278 and therefore the test always passes. However, there is indeed a logical error.