lion03 / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

investigate radix sort failures on Ocelot backend #213

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

So I was playing around with the new radix sort implementation and was having a 
lot of problems getting it to run on ocelot.  It looks like the warp size is 
assumed to be 32, and of course this produces race conditions that are detected 
by the ocelot race detector and cause incorrect execution on the ocelot CPU 
backend.  See radixsort_reduction_kernel.h:131

Another possible error occurs on in radixsort_spine_kernel.h:88 
(kernel_utils.h:175), although I think that this one is an nvcc bug. Here the 
index [warpscan_idx - 1] may be -1 if warpscan_idx is 0.  NVCC treats the index 
as an unsigned 32-bit value which will be 0xffffffff, then casts it to a 64-bit 
unsigned value, which will be 0x00000000ffffffff due to lack of sign extension. 
 Finally it adds it to an index (32) and the expected result should be 31, but 
it actually ends up being something like (0x00000000ffffffff + 0x4). This runs 
on the device probably because the result is not used and GPUs probably mask 
out of bounds shared memory accesses. I am currently using nvcc 3.0, so this 
might be fixed in 3.1, but I'm not sure about it.

Original issue reported on code.google.com by wnbell on 10 Sep 2010 at 12:43

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 6 Feb 2011 at 6:28

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 30 Aug 2011 at 3:54

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 30 Aug 2011 at 3:54

GoogleCodeExporter commented 8 years ago

Original comment by jaredhoberock on 7 May 2012 at 9:54