OpenMP with Nested Loops, mat_mul() as Example

uwhpsc-2016 / lectures

Notes, slides, and code from the in-class lectures.

7 stars 21 forks source link

How does OpenMP treat nested loops? For example, in the famous mat_mul() example, how would it be treated

void mat_mul(double * out, double * A, double * B, int M, int N, int K)
{
   omp_set_num_threads(M * K); // one thread per output element
   #pragma omp parallel for  \
     schedule(static, N)
   for(int i = 0; i < M; ++i)
      for(int k = 0; k < K; ++k) 
         for(int j = 0; j < N; ++j)   {
            // no need for critical section since we have one thread per output element
            // what is bad here is that we do not make use of space locality   
            out[i*K + k] += A[i*N + j] * B[j*K + k];
         }
}

uwhpsc-2016 / lectures

OpenMP with Nested Loops, mat_mul() as Example #26