CUDA 3.1 and higher permit the use of non-pod types in statically-allocated
__shared__ memory. As a result cuda::detail::fast_scan can be dispatched in
more cases than before.
This ought to significantly improve performance of the reduce_by_key and
*_scan_by_key algorithms
Original issue reported on code.google.com by wnbell on 3 Sep 2010 at 1:36
Original issue reported on code.google.com by
wnbell
on 3 Sep 2010 at 1:36