tozhovez / cudpp

Automatically exported from code.google.com/p/cudpp
Other
0 stars 0 forks source link

cudppSegScan assumes all datatypes are 4 bytes #79

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
cudppSegScan kernels use a single shared memory buffer divided up among its 
different uses, which breaks when the data values are 8 bytes and the index / 
flag values are 4 bytes.

To fix, use two separate shared memory buffers.

Original issue reported on code.google.com by harr...@gmail.com on 5 Jul 2011 at 3:16

GoogleCodeExporter commented 8 years ago
Hmm, I think I was mistaken.  I have simplified the code a little bit to make 
it clearer though.

Original comment by harr...@gmail.com on 5 Jul 2011 at 3:28

GoogleCodeExporter commented 8 years ago
Actually, I was right.  Reopening. 

THe code was this:

// Last index in shared memory which contains data
    unsigned int lastIdx = ((blockDim.x << 1)-1);

    // Chop up the shared memory into 4 contiguous spaces - the first 
    // for the data, the second for the indices, the third for the 
    // read-only version of the flags and the last for the read-write
    // version of the flags
    unsigned int* indices = (unsigned int *)(&(temp[lastIdx + 1]));
    unsigned int* flags = (unsigned int *)(&(temp[2*(lastIdx + 1)]));

As you can see, if temp is 8 bytes, then the flags pointer is offset too far.  
My next checkin will have this simpler code which is correct:

// Chop up the shared memory into 4 contiguous spaces - the first 
    // for the data, the second for the indices, the third for the 
    // read-only version of the flags and the last for the read-write
    // version of the flags
    unsigned int* indices = (unsigned int *)(temp + 2*blockDim.x);
    unsigned int* flags   = (unsigned int *)(indices + 2*blockDim.x);

Original comment by harr...@gmail.com on 6 Jul 2011 at 2:07

GoogleCodeExporter commented 8 years ago
This is now fixed.

Original comment by harr...@gmail.com on 7 Jul 2011 at 3:41