lion03 / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

radix_sort() spends a lot of time in cudaGetDeviceProperties #318

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I can improve the performance of my algorithm by 10% if I change

bool manualCoalesce = radix_sort_use_manual_coalescing();

to 

static bool manualCoalesce = radix_sort_use_manual_coalescing();

in stable_radix_sort.inl:radix_sort()

while this may be a quick and dirty fix, getting device properties is expensive 
and it is worth considering if it is necessary to probe them on every call to 
the sort function.

Original issue reported on code.google.com by diman.to...@gmail.com on 4 Mar 2011 at 1:28

GoogleCodeExporter commented 8 years ago
same applies for radix_sort_by_key() in the same file further below.

Original comment by diman.to...@gmail.com on 4 Mar 2011 at 2:01

GoogleCodeExporter commented 8 years ago
The radix_sort implementation has changed, but the new one still calls 
cudaGetDeviceProps.

Original comment by wnbell on 21 Aug 2011 at 10:39

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 21 Aug 2011 at 10:40

GoogleCodeExporter commented 8 years ago
We'll address this when we refresh the b40c code in v1.7

Original comment by wnbell on 25 Jan 2012 at 5:01

GoogleCodeExporter commented 8 years ago
Forwarded to https://github.com/thrust/thrust/issues/48

Original comment by jaredhoberock on 7 May 2012 at 8:52