Copying values in parallel

qalshidi commented 5 years ago

I'm trying to have an operator that copies all the values from the i-1 cells to i in parallel. I think there shouldn't be any race conditions violated unless I'm missing something. This is basically what the code looks like.

#pragma omp parallel for schedule(dynamic)
for(uint j = 0; j < ctx.nz(); j++) {
  im1[j] = viennacl::range(inds(0, j, ctx), inds(ctx.nx()-2, j, ctx)+1);
  i[j] = viennacl::range(inds(1, j, ctx), inds(ctx.nx()-1, j, ctx)+1);
}
#pragma omp parallel for
for(uint j = 0; j < ctx.nz(); j++) {
  viennacl::project(xn_im1, i[j], eq) = viennacl::project(xn, im1[j], eq);
}

If I remove the second omp directive it works fine and tests fine, but with it I get NaNs in my matrix. Is it not possible to get something like this done quickly. This is indirectly related to #228 .

Actual code can be found here: https://github.com/qalshidi/comfi/blob/master/operators.cpp

karlrupp commented 5 years ago

I agree that this code looks fine and shouldn't have any race conditions (since xn and xn_im1 are different objects). Which backend are you using? Does the problem show up with the conventional CPU backend? I'm not sure whether CUDA and OpenCL are thread-safe within the same context.

qalshidi commented 5 years ago

ocl::current_device().info() outputs:

Device Info: Name: GeForce GTX 970 Vendor: NVIDIA Corporation Type: GPU Available: 1 Max Compute Units: 13 Max Work Group Size: 1024 Global Mem Size: 4234018816 Local Mem Size: 49152 Local Mem Type: 1 Host Unified Memory: 0

qalshidi commented 5 years ago

Using the CPU backend, still works perfectly when commenting out the second openmp directive, I get this error with the uncommented:

double free or corruption (top) ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 pid=3154 comm="comfi" exe="/home/qusai/Documents/Code/comfi/build-comfi-Desktop-Debug/comfi/comfi" sig=11 res=1 11:21:44: The program has unexpectedly finished.

I think maybe I must let you know that I am using column major dense matrices. The debugger stops here before segfault (SIGSEGV):

/** @brief A tag for column-major storage of a dense matrix. */
struct column_major
{
  typedef column_major_tag         orientation_category;

  /** @brief Returns the memory offset for entry (i,j) of a dense matrix.
  *
  * @param i   row index
  * @param j   column index
  * @param num_rows  number of entries per row (including alignment)
  */
  static vcl_size_t mem_index(vcl_size_t i, vcl_size_t j, vcl_size_t num_rows, vcl_size_t /* num_cols */)
  {
    return i + j * num_rows;
  }
};

At the return statement. Changing to row major does not help. I have not tried CUDA backend because I have custom OpenCL kernels elsewhere that I need for things like element_max().

viennacl / viennacl-dev

Copying values in parallel #277