Closed amckinstry closed 7 years ago
The test system is an 8-way SMP.
So, just to confirm, there are 8 physical cores? (As opposed to hyper-threading virtual cores)
Correct. partch.debian.org is an LPAR on IBM Power 730 Express. 8 physical cores.
@amckinstry, we just produced a new release. Could you please try it and let us know if you still see problems?
@afanfa, please let us know if you have any thoughts about what might be causing this problem?
Thanks, testing now.
@amckinstry, any luck? If not, I can investigate test 12 failure next week. @afanfa wold be the best person to investigate the others.
@amckinstry wrote:
Again this is a regression.
So this wasn't happening with a previous release of OpenCoarrays? Or did upgrading to GCC 6.1 or OpenMPI 1.10.3 trigger it? I remember reading about a number of unresolved OpenMPI bugs, but I thought they were only relevant to oversubscribed tests...
If it truly is an OpenCoarrays regression, knowing which versions it worked with would be helpful. If we gain access to similar hardware we can run a git bisection to see which changes introduced the errors.
Any luck, @amckinstry? Also, I wonder if this is the same as #246. Which MPI implementation are you using? I'm assuming it's obtained from the package manager?
Hi,
I've enabled a build of open-coarrays 1.7.4 (against openmpi 2.0, gcc 6.2). In this build i've
https://buildd.debian.org/status/package.php?p=open-coarrays&suite=experimental
It appears to now work on powerpc, though alpha is untested due to build dependencies.
regards Alasrtair
On 12/09/2016 00:13, Izaak Beekman wrote:
@amckinstry https://github.com/amckinstry wrote https://github.com/sourceryinstitute/opencoarrays/issues/212#issue-170594354:
Again this is a regression.
So this wasn't happening with a previous release of OpenCoarrays? Or did upgrading to GCC 6.1 or OpenMPI 1.10.3 trigger it? I remember reading about a number of unresolved OpenMPI bugs, but I thought they were only relevant to oversubscribed tests...
If it truly is an OpenCoarrays regression, knowing which versions it worked with would be helpful. If we gain access to similar hardware we can run a git bisection to see which changes introduced the errors.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sourceryinstitute/opencoarrays/issues/212#issuecomment-246211274, or mute the thread https://github.com/notifications/unsubscribe-auth/AA32rtCOukcyzNoaEtOKPhdXhSxwm4-Tks5qpIsPgaJpZM4Jh5kN.
Alastair McKinstry, alastair@sceal.ie, mckinstry@debian.org, https://diaspora.sceal.ie/u/amckinstry Misentropy: doubting that the Universe is becoming more disordered.
@amckinstry Alastair: great news! When did you test this (which branch and commit)?
Am I reading the build-dependency issue correctly: alpha can't build/doesn't have an MPI implementation?
Is there anything else that you would like us to trouble shoot? If not, please close this ticket if you feel it is adequately resolved.
We should have a new release out in a few days.
@amckinstry I'm closing this issue, but please let me know if you encounter further issues.
I'm the Debian maintainer of opencoarrays, and we see test failures on powerpc and alpha architectures.
The important ones are powerpc:
test 6 Start 6: allocate_as_barrier_proc
6: Test command: /usr/bin/mpiexec "-np" "8" "/home/mckinstry/open-coarrays-1.6.2/obj-powerpc-linux-gnu/src/tests/unit/init_register/allocate_as_barrier_proc" 6: Test timeout computed to be: 9.99988e+06 6: Test failed. 0 1/1 Test #6: allocate_as_barrier_proc .........***Failed Required regular expression not found.Regex=[Test passed. ] 1.32 sec 0% tests passed, 1 tests failed out of 1 test 7 Start 7: get_array
7: Test command: /usr/bin/mpiexec "-np" "2" "/home/mckinstry/open-coarrays-1.6.2/obj-powerpc-linux-gnu/src/tests/unit/send-get/get_array" 7: Test timeout computed to be: 9.99988e+06
test 6 hangs on 8 processors or more processors. The test system is an 8-way SMP. test 7 seems to hang.
The test machine: Linux partch 3.16.0-4-powerpc64 #1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) ppc64 GNU/Linux Debian sid, which includes openmpi 1.10.3 built with gcc 6.1.
On Alpha, we see: 12/23 Test #12: co_sum ...........................***Failed Required regular expression not found.Regex=[Test passed. ] 0.18 sec number of images doesn't evenly divide into number of points number of images doesn't evenly divide into number of points ERROR STOP number of images doesn't evenly divide into number of points ERROR STOP ERROR STOP
Again this is a regression.