mmizzle9 / cudpp

Automatically exported from code.google.com/p/cudpp
Other
0 stars 0 forks source link

cudppSort error in cudpp 1.1.1 for a large array #51

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. tar -xzvf sort_test.tar.gz
2. cd sort_test
3. make
4. ./testsort 1000000

What is the expected output? What do you see instead?
expected :
before sort
radix sort : 0.00834246 s 1000000 elements

what I see : before sort
radix sort : 0.00834246 s 1000000 elements
sort error 1 539193 259683

$./cudpp_testrig -sort -n=1000000
Using device 0: Quadroplex 2200 S4
Quadroplex 2200 S4; global mem: 4294705152B; compute v1.3; clock: 1296000 
kHz
Running a sort of 1000000 unsigned int key-value pairs
Unordered key[1]:35632 > key[2]:17645
Incorrectly sorted value[0] (382903) 1001492540 != 2704
GPU test FAILED
Average execution time: 8.087020 ms

1 tests failed

What version of the product are you using? On what operating system?
device is shown above.
cudpp 1.1.1
cuda sdk 2.3
$uname -a
Linux tesla 2.6.18-128.1.1.el5 #1 SMP Tue Feb 10 11:36:29 EST 2009 x86_64 
x86_64 x86_64 GNU/Linux
$cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 
15:29:46 PST 2009
GCC version:  gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)

Please provide any additional information below.

I have included compiled cudpp library in the attachment.
If it is a driver mismatch, please specify an appropriate version number.
Thank yu.

Original issue reported on code.google.com by Eunjin...@gmail.com on 30 Mar 2010 at 12:31

Attachments:

GoogleCodeExporter commented 8 years ago
Issue 50 has been merged into this issue.

Original comment by harr...@gmail.com on 31 Mar 2010 at 6:58

GoogleCodeExporter commented 8 years ago
Eunjin.Im, can you try the latest CUDPP 1.1.1 and see if it fixes your issue?  
It works 
for me on a Tesla C1060 on a similar Linux system.  I don't have a Quadro S4 
available.

Thanks,
Mark

Original comment by harr...@gmail.com on 27 Apr 2010 at 7:21

GoogleCodeExporter commented 8 years ago
I believe this is fixed by r110 which adds __launch_bounds__ (only in CUDA 3.0) 
to 
radix sort kernels.  I tested on x86_64 RHEL 5 with CUDA 3.0 and it was not 
working 
before this change, and working after it, so I assume it is fixed.

I will mark it fixed, and if Eunjim.Im reports that it is not fixed on his 
system, we 
can reopen.

Original comment by harr...@gmail.com on 29 Apr 2010 at 6:05