Closed mexas closed 7 years ago
My coarray test programs built with this version of OCA seem to build and run fine. Therefore I blame bashisms in OCA test suite.
I don't think this is the source of the problems... It might be a CMake/CTest or MPI environment issue? Bash 4 is plenty modern, all the tests work for me with Bash 3.x, so don't bother upgrading Bash---4.1.2 is quite modern and in fact, it appears that the only test that is passing is the test of one of the Bash installation script library/module.
Also, I'm happy to hear that your manual testing is working well. I'm guessing there may be an issue with the MPI environment when the test suite is run...
By the way, can you launch MPI batch jobs from the node that you're building and running the test suite on? Or do you need SLURM/Torque, etc. job managers to run MPI batch jobs?
If you would like further assistance, could you please include:
uname -a
ctest --output-on-failure
(while in the build directory)/cm/shared/apps/mvapich2-2.3a/bin/mpif90 --version
/cm/shared/apps/mvapich2-2.3a/bin/mpif90 -show
mpiexec --version
I haven't tested much with mvapich, so there may be an issue with how the test suite/CMake/CTest is treating your environment and interacting with MVAPICH?
Thanks
cmake is up to date:
$ cmake --version
cmake version 3.8.1
Other details you asked for:
[mexas@newblue2 tests]$ uname -a
Linux newblue2 2.6.32-642.6.2.el6.x86_64 #1 SMP Tue Oct 25 15:06:33 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[mexas@newblue2 tests]$ /cm/shared/apps/mvapich2-2.3a/bin/mpif90 --version
GNU Fortran (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[mexas@newblue2 tests]$ /cm/shared/apps/mvapich2-2.3a/bin/mpif90 -show
/cm/shared/languages/GCC-7.1.0/bin/gfortran -I/cm/shared/apps/mvapich2-2.3a/include -I/cm/shared/apps/mvapich2-2.3a/include -L/cm/shared/apps/mvapich2-2.3a/lib -lmpifort -Wl,-rpath -Wl,/cm/shared/apps/mvapich2-2.3a/lib -Wl,--enable-new-dtags -lmpi
[mexas@newblue2 tests]$
[mexas@newblue2 build]$ pwd
/panfs/panasas01/mech/mexas/soft/OpenCoarrays-1.8.10/build
[mexas@newblue2 build]$ ctest --output-on-failure
Test project /panfs/panasas01/mech/mexas/soft/OpenCoarrays-1.8.10/build
Start 1: initialize_mpi
1/41 Test #1: initialize_mpi ................... Passed 0.03 sec
Start 2: register
2/41 Test #2: register ......................... Passed 0.04 sec
Start 3: register_vector
3/41 Test #3: register_vector .................. Passed 0.03 sec
Start 4: register_alloc_vector
4/41 Test #4: register_alloc_vector ............ Passed 0.04 sec
Start 5: allocate_as_barrier
5/41 Test #5: allocate_as_barrier .............. Passed 1.04 sec
Start 6: allocate_as_barrier_proc
6/41 Test #6: allocate_as_barrier_proc .........***Failed Required regular expression not founRegex=[Test passed.
] 0.14 sec
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_20]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_24]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_25]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Start 7: register_alloc_comp_1
7/41 Test #7: register_alloc_comp_1 ............ Passed 0.04 sec
Start 8: register_alloc_comp_2
8/41 Test #8: register_alloc_comp_2 ............ Passed 0.03 sec
Start 9: register_alloc_comp_3
9/41 Test #9: register_alloc_comp_3 ............ Passed 0.03 sec
Start 10: get_array
10/41 Test #10: get_array ........................ Passed 0.08 sec
Start 11: get_self
11/41 Test #11: get_self ......................... Passed 0.03 sec
Start 12: send_array
12/41 Test #12: send_array ....................... Passed 0.52 sec
Start 13: get_with_offset_1d
13/41 Test #13: get_with_offset_1d ............... Passed 0.03 sec
Start 14: whole_get_array
14/41 Test #14: whole_get_array .................. Passed 0.03 sec
Start 15: strided_get
15/41 Test #15: strided_get ...................... Passed 0.04 sec
Start 16: strided_sendget
16/41 Test #16: strided_sendget .................. Passed 0.04 sec
Start 17: co_sum
17/41 Test #17: co_sum ........................... Passed 0.04 sec
Start 18: co_broadcast
18/41 Test #18: co_broadcast ..................... Passed 0.04 sec
Start 19: co_min
19/41 Test #19: co_min ........................... Passed 0.04 sec
Start 20: co_max
20/41 Test #20: co_max ........................... Passed 0.04 sec
Start 21: syncall
21/41 Test #21: syncall ..........................***Failed Required regular expression not founRegex=[Test passed.
] 0.14 sec
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_16]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_19]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_20]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_21]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_26]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_30]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_17]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_18]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_25]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_24]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Start 22: syncimages
22/41 Test #22: syncimages .......................***Failed Required regular expression not founRegex=[Test passed.
] 0.14 sec
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_18]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_20]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Start 23: syncimages2
23/41 Test #23: syncimages2 ......................***Failed Required regular expression not founRegex=[Test passed.
] 0.16 sec
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_16]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_24]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Start 24: duplicate_syncimages
24/41 Test #24: duplicate_syncimages ............. Passed 1.54 sec
Start 25: co_reduce
25/41 Test #25: co_reduce ........................ Passed 0.04 sec
Start 26: co_reduce_res_im
26/41 Test #26: co_reduce_res_im ................. Passed 0.04 sec
Start 27: syncimages_status
27/41 Test #27: syncimages_status ................***Failed Required regular expression not founRegex=[Test passed.
] 0.14 sec
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
[cli_18]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_26]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_22]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_25]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_27]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_29]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_20]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_21]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_24]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Start 28: sync_ring_abort_np3
28/41 Test #28: sync_ring_abort_np3 .............. Passed 0.04 sec
Start 29: sync_ring_abort_np7
29/41 Test #29: sync_ring_abort_np7 .............. Passed 0.05 sec
Start 30: simpleatomics
30/41 Test #30: simpleatomics ....................***Failed Required regular expression not founRegex=[Test passed.
] 0.15 sec
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_22]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_19]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
[cli_20]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_24]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_27]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_29]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_30]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_16]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_25]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_28]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Error parsing CPU mapping string
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIDI_CH3I_set_affinity:73
[cli_31]: aborting job:
Fatal error in MPI_Init:
Other MPI error, error stack:
MPIR_Init_thread(490):
MPID_Init(386).......:
Start 31: hello_multiverse
31/41 Test #31: hello_multiverse ................. Passed 0.03 sec
Start 32: coarray_burgers_pde
32/41 Test #32: coarray_burgers_pde .............. Passed 0.06 sec
Start 33: co_heat
33/41 Test #33: co_heat .......................... Passed 0.95 sec
Start 34: asynchronous_hello_world
34/41 Test #34: asynchronous_hello_world ......... Passed 0.04 sec
Start 35: source-alloc-no-sync
35/41 Test #35: source-alloc-no-sync ............. Passed 0.05 sec
Start 36: event-post
36/41 Test #36: event-post ....................... Passed 0.04 sec
Start 37: co_reduce-factorial
37/41 Test #37: co_reduce-factorial .............. Passed 0.04 sec
Start 38: co_reduce-factorial-int8
38/41 Test #38: co_reduce-factorial-int8 ......... Passed 0.04 sec
[mexas@newblue2 build]$
Thank you
This appears to be an issue with your MPI implementation... I'm not entirely sure what's going on but please note:
Error parsing CPU mapping string
: This is not a GFortran or OpenCoarrays error as far as I know, this is a MVAPICH error. I'll try to confirm this, but upon first inspection this appears to be an issue with MVAPICH supporting the MPI 3 features we use.Further googling seems to suggest this may be an issue related to oversubscribing.
Just out of curiosity, what does cat /proc/sys/kernel/pid_max
report?
I'm going to attach this to #267 because it appears to be related
$ cat /proc/sys/kernel/pid_max
32768
@mexas I really am not sure how to debug this further... It's odd that so many tests are failing, I don't think that oversubscription is the issue...
If you don't have any other thoughts, I think I might close this as "won't fix" since it sounds like you're having more success with other avenues.
I'm having trouble keeping all of your bug reports straight... How many systems are you running on? Which MPI implementations are they using? Which architectures? I'm closing this for now as "won't fix". Please open up a new issue and clearly include MPI version(s), compiler wrapped by MPI (i.e., mpicc --version
and mpifort --version
) version of OpenCoarrays, C compiler used to build OpenCoarrays, CMake version, GFortran version, architecture (including number of physical cores, or total core count), hostname (to keep all of them straight) and which tests are failing. Ideally this would be in a tubular form, with all systems and toolchains in one issue, one row per host/toolchain.
It sounds like you've had some success on at least one configuration with 1.8.12 which is encouraging.
My coarray test programs built with this version of OCA seem to build and run fine. Therefore I blame bashisms in OCA test suite. I have:
I'll try to update bash, but please consider making OCA more portable by accepting other shells. The world is bigger than linux, etc. etc.
Thanks
Anton