Open qalshidi opened 5 years ago
I agree that this code looks fine and shouldn't have any race conditions (since xn
and xn_im1
are different objects). Which backend are you using? Does the problem show up with the conventional CPU backend? I'm not sure whether CUDA and OpenCL are thread-safe within the same context.
ocl::current_device().info() outputs:
Device Info: Name: GeForce GTX 970 Vendor: NVIDIA Corporation Type: GPU Available: 1 Max Compute Units: 13 Max Work Group Size: 1024 Global Mem Size: 4234018816 Local Mem Size: 49152 Local Mem Type: 1 Host Unified Memory: 0
Using the CPU backend, still works perfectly when commenting out the second openmp directive, I get this error with the uncommented:
double free or corruption (top) ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=3154 comm="comfi" exe="/home/qusai/Documents/Code/comfi/build-comfi-Desktop-Debug/comfi/comfi" sig=11 res=1 11:21:44: The program has unexpectedly finished.
I think maybe I must let you know that I am using column major dense matrices. The debugger stops here before segfault (SIGSEGV):
/** @brief A tag for column-major storage of a dense matrix. */
struct column_major
{
typedef column_major_tag orientation_category;
/** @brief Returns the memory offset for entry (i,j) of a dense matrix.
*
* @param i row index
* @param j column index
* @param num_rows number of entries per row (including alignment)
*/
static vcl_size_t mem_index(vcl_size_t i, vcl_size_t j, vcl_size_t num_rows, vcl_size_t /* num_cols */)
{
return i + j * num_rows;
}
};
At the return statement. Changing to row major does not help. I have not tried CUDA backend because I have custom OpenCL kernels elsewhere that I need for things like element_max().
I'm trying to have an operator that copies all the values from the i-1 cells to i in parallel. I think there shouldn't be any race conditions violated unless I'm missing something. This is basically what the code looks like.
If I remove the second omp directive it works fine and tests fine, but with it I get NaNs in my matrix. Is it not possible to get something like this done quickly. This is indirectly related to #228 .
Actual code can be found here: https://github.com/qalshidi/comfi/blob/master/operators.cpp