lion03 / thrust

Automatically exported from code.google.com/p/thrust
Apache License 2.0
0 stars 0 forks source link

Incorrect Scan Results #305

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Please post a short, self-contained code sample which reproduces the
problem:
   1. The attached file uses thrust::inclusive_scan() with an 
      associative operator, but the results are not correct.
   2. Also in the file is some scaffolding showing that the 
      operator is associative. 
      (Checks all possible combinations for (x+a)+y = x+(a+y) )
   3. The second file, is some sample input that seems to trigger
      the problem.
   4. Compiled with: nvcc -o scan -arch=sm_13 scan.cu
   5. Run with: ./scan device_number input_file_name
           e.g: ./scan 0 f_min.in 
                will run on device zero with input from file f_min.in

What is the expected output? What do you see instead?
   1. Expected output is the same answer as a sequential 
      scan on the host.
   2. The values are not the same in all positions.

What version of Thrust are you using? Which version of nvcc?  Which host
compiler?  On what operating system?
   1. Thrust version release 1.3
   2. nvcc 3.2, gcc 4.2.2
   3. Linux x64, Redhat 5.0
   4. Tesla S1070 GPU System

Other comments:
The example is still a bit complex but I found that using simpler 
associative operators (project1st,project2nd) I couldn't replicate
the problem.  That is why I include the scaffolding to show that 
it is associative.  
The operator is an attempt to have a scan that passes
a flag from each marked start element until the first instance of a
differently marked end element.

Original issue reported on code.google.com by scott.ro...@gmail.com on 11 Feb 2011 at 4:42

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 11 Feb 2011 at 9:32

GoogleCodeExporter commented 8 years ago
I haven't been able to figure this out, but I've found some very puzzling 
further results, that I'm hoping might help you diagnose the problem.

1) First, I checked to make sure that the operator is giving the same results 
on the device as the host.  
2) I started commenting out lines in fast_scan.inl, in the scan_intervals 
kernel.  In particular I commented out the scan_block function and instead just 
added
  sdata[K][threadIdx.x] = sum;
  __syncthreads();
then had each thread just store the updated value passed to it. 
i.e. Instead of:
  OutputType tmp = sdata[k][threadIdx.x];
  sdata[k][threadIdx.x] = binary_op(sum, tmp);
Just save the sum:
  sdata[k][threadIdx.x] = sum;
3) I ran that code and looked at the numbers after the first call to 
scan_intervals, and they were correct for that point in the code.  I then undid 
all of the commenting recompiled and ran the code again on the same GPU.  It is 
now giving the correct results on that input.  But if I run the same binary on 
one of the other GPUs (Its a S1070) it still gives the same incorrect results 
as before.

Any ideas on what I should be looking for here?  Have you seen a problem like 
this before?

Original comment by scott.ro...@gmail.com on 14 Feb 2011 at 6:53

GoogleCodeExporter commented 8 years ago
Hi Scott,

Thanks for the detailed report!  Could you try compiling your code with the 
latest development version of Thrust [1] instead of v1.3?  I was able to 
reproduce the error using Thrust v1.3, but the current development version 
(which is slated to become v1.4) checks out OK.

If the development version works for you too I'd wager this was the fix [2].

[1] http://code.google.com/p/thrust/source/checkout
[2] 
http://code.google.com/p/thrust/source/detail?spec=svn69b983ffe693ed9c1e808c563e
9936cea55be418&r=0835daf0904c211737e20a9731e19a9cf3c1fad2

Original comment by wnbell on 14 Feb 2011 at 7:11

GoogleCodeExporter commented 8 years ago
Good to go!  Guess I'll stick with the development code from now on.
Thanks.

Original comment by scott.ro...@gmail.com on 14 Feb 2011 at 7:39

GoogleCodeExporter commented 8 years ago

Original comment by wnbell on 14 Feb 2011 at 7:45