Open alyfarahat opened 8 years ago
The work-sharing occurs at the top-level loop. In particular, the work-sharing occurs at the for loop just after the #pragma omp for
directive.
So in the matmul
code above each thread will be responsible for some row of the output matrix out
. This is nice from a contiguity perspective. Your comment above about not needing a critical section is absolutely correct. (Though, there is a bug in that you need to zero-out the array elements before accumulating into them.)
How does OpenMP treat nested loops? For example, in the famous
mat_mul()
example, how would it be treated