the-nightling / cudpp

Automatically exported from code.google.com/p/cudpp
Other
0 stars 0 forks source link

sorting test failed. #30

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
I just build cudpp and ran cudpp_testrig which failed with 

(all previous tests were correct)
Running a sort of 1048581 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 2.586515 ms
Running a sort of 2097152 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms
Running a sort of 4194304 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms
Running a sort of 8388608 unsigned int key-value pairs
Unordered key[1048576]:4294966923 > key[1048577]:0
Incorrectly sorted value[1048577] (0) 3530798281 != 0
GPU test FAILED
Average execution time: 0.000000 ms

My gpu card is a Tesla C1060.

Original issue reported on code.google.com by korgulec@gmail.com on 19 Jul 2009 at 10:07

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I added a manyal check for error after radix sort test code, and it reports me 
unspecified launch failure.

Original comment by korgulec@gmail.com on 19 Jul 2009 at 12:06

GoogleCodeExporter commented 9 years ago
korgulec,

What operating system (and is it 32-bit or 64-bit)?  If Linux, please mention a 
specific distro.  Also, what version 
of the CUDA Toolkit do you have installed?

Mark

Original comment by harr...@gmail.com on 21 Jul 2009 at 10:01

GoogleCodeExporter commented 9 years ago
Mark,
openSUSE 11.1 64-bit.
cuda toolkit is 2.2.

btw the error is not deterministic.

Michal.

Original comment by korgulec@gmail.com on 22 Jul 2009 at 12:39

GoogleCodeExporter commented 9 years ago
Compiled cudpp with debug info on, and found that there are two places where 
CUT_CHECK_ERRORS prints an error,
Cuda error: scanArray before kernels in file 'src/app/scan_app.cu' in line 104 
: 
unspecified launch failure.
Cuda error: radixSortStepKeysOnly in file 'src/app/radixsort_app.cu' in line 
653 : 
unspecified launch failure.

Original comment by korgulec@gmail.com on 22 Jul 2009 at 8:04

GoogleCodeExporter commented 9 years ago
If i set numElements to a power of two (checked for from ^16 to ^20) sorting 
keys only 
fails always.

Original comment by korgulec@gmail.com on 22 Jul 2009 at 8:36

GoogleCodeExporter commented 9 years ago
I am unable to reproduce this on a GeForce GTX 280 on Ubuntu 8.04 64-bit.  I 
don't have 
a Tesla but I will ask at work to see if I can ssh to a machine with one.

Original comment by harr...@gmail.com on 24 Jul 2009 at 6:52

GoogleCodeExporter commented 9 years ago
Korgulec,

I am unable to reproduce this problem on a 64-bit Ubuntu 8.10 box with 3 Tesla 
C1060
GPUs.  The sorts all pass on every one of the C1060 GPUs.  I am going to ask 
around
to see if anyone has an OpenSUSE 11.1 box I can test on.  

But first, can you please tellme which NVIDIA driver version you have 
installed?  The
box I tested on has 185.18.08 and CUDA 2.2.

Thanks,
Mark

Original comment by harr...@gmail.com on 24 Jul 2009 at 8:55

GoogleCodeExporter commented 9 years ago
Mark,
these are file which i've installed.

cudadriver_2.2_linux_64_185.18.08-beta.run
cudasdk_2.2_linux.run
cudatoolkit_2.2_linux_64_suse11.1.run

I've just found that there is cuda 2.3 and newer drivers. I guess i'll try 
updating.

Michal.

Original comment by korgulec@gmail.com on 24 Jul 2009 at 4:27

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I've tested many examples from sdk, and running them multiple times causes 
errors. I 
guess something is wrong with the card/fan. I should be albe to sheck that 
tomorrow 
or at Monday.

Michal.

Original comment by korgulec@gmail.com on 24 Jul 2009 at 5:42

GoogleCodeExporter commented 9 years ago
Michal,

The only sure way to determine if it's a hardware problem is to try another 
card.  If you are getting a lot of non-
deterministic failures on SDK samples as well as CUDPP tests, then it is 
possible you do have something wrong 
with your GPU.

Mark

Original comment by harr...@gmail.com on 25 Jul 2009 at 6:52

GoogleCodeExporter commented 9 years ago
Mark,
A little late, but I know now that my motherboard is broken. I am sorry for any 
inconvenience I have caused with this issue.

Michal.

Original comment by korgulec@gmail.com on 30 Jul 2009 at 1:29

GoogleCodeExporter commented 9 years ago
Hi Michal,

That's OK, thanks for getting back to us about it.  Sorry to hear about your 
hardware problems. :(

Closing this issue.

Original comment by harr...@gmail.com on 30 Jul 2009 at 9:53