Open GoogleCodeExporter opened 8 years ago
[deleted comment]
For details of the algorithm behind each of these 2 implementations, read the
references cited in each of the 2 individual .cl files.
"GPU scan" is much simpler and performance much better (at least on my GPU
hardware platform), using a single invokation of a kernel instead of multilple
GPU kernels that "default scan" does, and it also does a good job optimizing to
"warp sizes", or SIMT size, of the underlying GPU hardware to eliminate GPU
thread synchronization.
Original comment by Edward.K...@gmail.com
on 1 May 2013 at 10:27
Original issue reported on code.google.com by
rongguod...@gmail.com
on 21 Jun 2012 at 9:48