lanl / PENNANT

Unstructured mesh hydrodynamics for advanced architectures
Other
20 stars 30 forks source link

improve OpenMP usage #2

Open jeffhammond opened 7 years ago

jeffhammond commented 7 years ago

1) reduced the number of fork-join per iteration

omp parallel for does a fork-join, which can get expensive at large thread-counts. when this construct is used many times in a function, it should be replaced with a single omp parallel around multiple omp for.

The code previously found between parallel regions is assumed to require serialization and uses pragma omp single for protection. single is used instead of master to allow the first encountering thread in the team to do the work, rather than waiting for the master thread.

Technically, but never in practice, single requires MPI_THREAD_SERIALIZED instead of MPI_THREAD_FUNNELED. master only requires MPI_THREAD_FUNNELED.

It is possible that single nowait is sufficient, in which case a few barriers can be eliminated. (aside: master does not imply a barrier).

2) pragma omp simd wherever pragma ivdep is used

The OpenMP standard defines pragma omp simd semantics identical to the convention meaning of the non-standard pragma ivdep.

The Intel compiler treats pragma omp simd as an assertion rather than a hint so if SIMD isn't appropriate, this pragma should be conditionalized using preprocessor (C99/C++11 _Pragma being the O(1) solution here).

jeffhammond commented 7 years ago

I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.

brobey commented 7 years ago

Jeff, Have you run Intel Thread Inspector on the code to be sure there are no race conditions? Pushing the parallel region up higher makes it more likely that a race condition occurs.

From: Jeff Hammond notifications@github.com Reply-To: lanl/PENNANT reply@reply.github.com Date: Tuesday, February 7, 2017 at 1:01 PM To: lanl/PENNANT PENNANT@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [lanl/PENNANT] improved OpenMP usage (#2)

I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/lanl/PENNANT/pull/2#issuecomment-278122591, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABPgLApGo_en5Moy9qLztiRq9lItmY4tks5raM2PgaJpZM4L59qE.

jeffhammond commented 7 years ago

No, I haven't used that tool before. The side-by-side changes were simple enough that I felt I could reason about the race conditions from the OpenMP semantics.

I'll figure out Intel Thread Checker and try that. I hope it is not making any assumptions about x86 consistency in its analysis...

cferenba commented 7 years ago

Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).

brobey commented 7 years ago

You can run a gui version or the command line version of the Intel thread inspector. From our test script:

inspxe-cl -collect=ti2 -result-dir ${TEST_NAME} -- ${executable} ${TEST_NAME}.in >& ${TEST_NAME}_openmp.out || true inspxe-cl -report problems -result-dir ${TEST_NAME}

The gui is launched with inspxe-gui and is a little more intuitive to use.

You need to use the intel compiler and the intel tools (intel-performance-tools/2017.1.024).

It will flag race conditions that might not appear until later.

From: Charles Ferenbaugh notifications@github.com Reply-To: lanl/PENNANT reply@reply.github.com Date: Wednesday, February 8, 2017 at 9:13 PM To: lanl/PENNANT PENNANT@noreply.github.com Cc: Robert Robey brobey@lanl.gov, Comment comment@noreply.github.com Subject: Re: [lanl/PENNANT] improve OpenMP usage (#2)

Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/lanl/PENNANT/pull/2#issuecomment-278542565, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABPgLDX091BKHe2QmZdk4aL3pIN7G3m0ks5rapJ1gaJpZM4L59qE.