sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
244 stars 58 forks source link

Test failures on powerpc, alpha #212

Closed amckinstry closed 7 years ago

amckinstry commented 8 years ago

I'm the Debian maintainer of opencoarrays, and we see test failures on powerpc and alpha architectures.

The important ones are powerpc:

test 6 Start 6: allocate_as_barrier_proc

6: Test command: /usr/bin/mpiexec "-np" "8" "/home/mckinstry/open-coarrays-1.6.2/obj-powerpc-linux-gnu/src/tests/unit/init_register/allocate_as_barrier_proc" 6: Test timeout computed to be: 9.99988e+06 6: Test failed. 0 1/1 Test #6: allocate_as_barrier_proc .........***Failed Required regular expression not found.Regex=[Test passed. ] 1.32 sec 0% tests passed, 1 tests failed out of 1 test 7 Start 7: get_array

7: Test command: /usr/bin/mpiexec "-np" "2" "/home/mckinstry/open-coarrays-1.6.2/obj-powerpc-linux-gnu/src/tests/unit/send-get/get_array" 7: Test timeout computed to be: 9.99988e+06

test 6 hangs on 8 processors or more processors. The test system is an 8-way SMP. test 7 seems to hang.

The test machine: Linux partch 3.16.0-4-powerpc64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) ppc64 GNU/Linux Debian sid, which includes openmpi 1.10.3 built with gcc 6.1.

On Alpha, we see: 12/23 Test #12: co_sum ...........................***Failed Required regular expression not found.Regex=[Test passed. ] 0.18 sec number of images doesn't evenly divide into number of points number of images doesn't evenly divide into number of points ERROR STOP number of images doesn't evenly divide into number of points ERROR STOP ERROR STOP

Again this is a regression.

zbeekman commented 8 years ago

The test system is an 8-way SMP.

So, just to confirm, there are 8 physical cores? (As opposed to hyper-threading virtual cores)

amckinstry commented 8 years ago

Correct. partch.debian.org is an LPAR on IBM Power 730 Express. 8 physical cores.

rouson commented 8 years ago

@amckinstry, we just produced a new release. Could you please try it and let us know if you still see problems?

@afanfa, please let us know if you have any thoughts about what might be causing this problem?

amckinstry commented 8 years ago

Thanks, testing now.

rouson commented 8 years ago

@amckinstry, any luck? If not, I can investigate test 12 failure next week. @afanfa wold be the best person to investigate the others.

zbeekman commented 8 years ago

@amckinstry wrote:

Again this is a regression.

So this wasn't happening with a previous release of OpenCoarrays? Or did upgrading to GCC 6.1 or OpenMPI 1.10.3 trigger it? I remember reading about a number of unresolved OpenMPI bugs, but I thought they were only relevant to oversubscribed tests...

If it truly is an OpenCoarrays regression, knowing which versions it worked with would be helpful. If we gain access to similar hardware we can run a git bisection to see which changes introduced the errors.

zbeekman commented 7 years ago

Any luck, @amckinstry? Also, I wonder if this is the same as #246. Which MPI implementation are you using? I'm assuming it's obtained from the package manager?

amckinstry commented 7 years ago

Hi,

I've enabled a build of open-coarrays 1.7.4 (against openmpi 2.0, gcc 6.2). In this build i've

https://buildd.debian.org/status/package.php?p=open-coarrays&suite=experimental

It appears to now work on powerpc, though alpha is untested due to build dependencies.

regards Alasrtair

On 12/09/2016 00:13, Izaak Beekman wrote:

@amckinstry https://github.com/amckinstry wrote https://github.com/sourceryinstitute/opencoarrays/issues/212#issue-170594354:

Again this is a regression.

So this wasn't happening with a previous release of OpenCoarrays? Or did upgrading to GCC 6.1 or OpenMPI 1.10.3 trigger it? I remember reading about a number of unresolved OpenMPI bugs, but I thought they were only relevant to oversubscribed tests...

If it truly is an OpenCoarrays regression, knowing which versions it worked with would be helpful. If we gain access to similar hardware we can run a git bisection to see which changes introduced the errors.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/opencoarrays/issues/212#issuecomment-246211274, or mute the thread https://github.com/notifications/unsubscribe-auth/AA32rtCOukcyzNoaEtOKPhdXhSxwm4-Tks5qpIsPgaJpZM4Jh5kN.

Alastair McKinstry, alastair@sceal.ie, mckinstry@debian.org, https://diaspora.sceal.ie/u/amckinstry Misentropy: doubting that the Universe is becoming more disordered.

zbeekman commented 7 years ago

@amckinstry Alastair: great news! When did you test this (which branch and commit)?

Am I reading the build-dependency issue correctly: alpha can't build/doesn't have an MPI implementation?

Is there anything else that you would like us to trouble shoot? If not, please close this ticket if you feel it is adequately resolved.

We should have a new release out in a few days.

zbeekman commented 7 years ago

@amckinstry I'm closing this issue, but please let me know if you encounter further issues.