Open jeffhammond opened 7 years ago
I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.
Jeff, Have you run Intel Thread Inspector on the code to be sure there are no race conditions? Pushing the parallel region up higher makes it more likely that a race condition occurs.
From: Jeff Hammond notifications@github.com Reply-To: lanl/PENNANT reply@reply.github.com Date: Tuesday, February 7, 2017 at 1:01 PM To: lanl/PENNANT PENNANT@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [lanl/PENNANT] improved OpenMP usage (#2)
I don't know what sort of QA is required to be sure the changes are correct. I'll be happy to run whatever you require.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/lanl/PENNANT/pull/2#issuecomment-278122591, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABPgLApGo_en5Moy9qLztiRq9lItmY4tks5raM2PgaJpZM4L59qE.
No, I haven't used that tool before. The side-by-side changes were simple enough that I felt I could reason about the race conditions from the OpenMP semantics.
I'll figure out Intel Thread Checker and try that. I hope it is not making any assumptions about x86 consistency in its analysis...
Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).
You can run a gui version or the command line version of the Intel thread inspector. From our test script:
inspxe-cl -collect=ti2 -result-dir ${TEST_NAME} -- ${executable} ${TEST_NAME}.in >& ${TEST_NAME}_openmp.out || true inspxe-cl -report problems -result-dir ${TEST_NAME}
The gui is launched with inspxe-gui and is a little more intuitive to use.
You need to use the intel compiler and the intel tools (intel-performance-tools/2017.1.024).
It will flag race conditions that might not appear until later.
From: Charles Ferenbaugh notifications@github.com Reply-To: lanl/PENNANT reply@reply.github.com Date: Wednesday, February 8, 2017 at 9:13 PM To: lanl/PENNANT PENNANT@noreply.github.com Cc: Robert Robey brobey@lanl.gov, Comment comment@noreply.github.com Subject: Re: [lanl/PENNANT] improve OpenMP usage (#2)
Jeff - For QA I typically run the five small test problems (nohsmall, noh, sedovsmall, sedov, leblanc) and verify that the outputs match the gold standard to within roundoff error. If you could do that, that would be great; or I can do it next week (I'm offsite this week and am not set up to run remotely).
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/lanl/PENNANT/pull/2#issuecomment-278542565, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABPgLDX091BKHe2QmZdk4aL3pIN7G3m0ks5rapJ1gaJpZM4L59qE.
1) reduced the number of fork-join per iteration
omp parallel for
does a fork-join, which can get expensive at large thread-counts. when this construct is used many times in a function, it should be replaced with a singleomp parallel
around multipleomp for
.The code previously found between parallel regions is assumed to require serialization and uses
pragma omp single
for protection.single
is used instead ofmaster
to allow the first encountering thread in the team to do the work, rather than waiting for the master thread.Technically, but never in practice,
single
requiresMPI_THREAD_SERIALIZED
instead ofMPI_THREAD_FUNNELED
.master
only requiresMPI_THREAD_FUNNELED
.It is possible that
single nowait
is sufficient, in which case a few barriers can be eliminated. (aside:master
does not imply a barrier).2)
pragma omp simd
whereverpragma ivdep
is usedThe OpenMP standard defines
pragma omp simd
semantics identical to the convention meaning of the non-standardpragma ivdep
.The Intel compiler treats
pragma omp simd
as an assertion rather than a hint so if SIMD isn't appropriate, this pragma should be conditionalized using preprocessor (C99/C++11_Pragma
being the O(1) solution here).