uwhpsc-2016 / lectures

Notes, slides, and code from the in-class lectures.
7 stars 21 forks source link

OpenMP with Nested Loops, mat_mul() as Example #26

Open alyfarahat opened 8 years ago

alyfarahat commented 8 years ago

How does OpenMP treat nested loops? For example, in the famous mat_mul() example, how would it be treated

void mat_mul(double * out, double * A, double * B, int M, int N, int K)
{
   omp_set_num_threads(M * K); // one thread per output element
   #pragma omp parallel for  \
     schedule(static, N)
   for(int i = 0; i < M; ++i)
      for(int k = 0; k < K; ++k) 
         for(int j = 0; j < N; ++j)   {
            // no need for critical section since we have one thread per output element
            // what is bad here is that we do not make use of space locality   
            out[i*K + k] += A[i*N + j] * B[j*K + k];
         }
}
cswiercz commented 8 years ago

The work-sharing occurs at the top-level loop. In particular, the work-sharing occurs at the for loop just after the #pragma omp for directive.

So in the matmul code above each thread will be responsible for some row of the output matrix out. This is nice from a contiguity perspective. Your comment above about not needing a critical section is absolutely correct. (Though, there is a bug in that you need to zero-out the array elements before accumulating into them.)