sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
246 stars 55 forks source link

Defect: cmake does not (always) correctly detect MPI features #509

Closed aetx closed 5 years ago

aetx commented 6 years ago
Avg response time
Issue Stats

Defect/Bug Report

OpenCoarrays compiled using

CC=gcc FC=gfortran cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/.modules/libopencoarrays/1.9.2_gcc7.2.0

Observed Behavior

When running cmake it seems to find MPIX_Comm_failure_get_acked

-- Looking for MPIX_ERR_PROC_FAILED
-- Looking for MPIX_ERR_PROC_FAILED - found
-- Looking for MPIX_ERR_REVOKED
-- Looking for MPIX_ERR_REVOKED - found
-- Looking for MPIX_Comm_failure_ack
-- Looking for MPIX_Comm_failure_ack - found
-- Looking for MPIX_Comm_failure_get_acked
-- Looking for MPIX_Comm_failure_get_acked - found
-- Looking for MPIX_Comm_shrink
-- Looking for MPIX_Comm_shrink - found
-- Looking for MPIX_Comm_agree
-- Looking for MPIX_Comm_agree - found
-- Looking for include file mpi.h
-- Looking for include file mpi.h - found
-- Looking for I_MPI_VERSION
-- Looking for I_MPI_VERSION - not found

However, when running the code it is not implemented in ParaStationMPI 5.2.0 and the following error appears

Warning: MPID_Comm_failure_get_acked() not implemented
Warning: MPID_Comm_failure_get_acked() not implemented
Warning: MPID_Comm_failure_get_acked() not implemented
Warning: MPID_Comm_failure_get_acked() not implemented
Fatal error in PMPIX_Comm_agree: Unsupported file operation , error stack:
PMPIX_Comm_agree(183): MPIX_Comm_agree(MPI_COMM_WORLD) failed
PMPIX_Comm_agree(169): 
MPIR_Comm_agree(49)..: 
(unknown)(): Unsupported file operation Fatal error in PMPIX_Comm_agree: Unsupported file operation , error stack:
PMPIX_Comm_agree(183): MPIX_Comm_agree(MPI_COMM_WORLD) failed
PMPIX_Comm_agree(169): 
MPIR_Comm_agree(49)..: 
(unknown)(): Unsupported file operation Fatal error in PMPIX_Comm_agree: Unsupported file operation , error stack:
PMPIX_Comm_agree(183): MPIX_Comm_agree(MPI_COMM_WORLD) failed
PMPIX_Comm_agree(169): 
MPIR_Comm_agree(49)..: 
(unknown)(): Unsupported file operation Fatal error in PMPIX_Comm_agree: Unsupported file operation , error stack:
PMPIX_Comm_agree(183): MPIX_Comm_agree(MPI_COMM_WORLD) failed
PMPIX_Comm_agree(169): 
MPIR_Comm_agree(49)..: 
(unknown)(): Unsupported file operation srun: error: jrc0538: tasks 0-3: Exited with exit code 44

When compiling with -DCAF_ENABLE_FAILED_IMAGES=FALSE the functions are still found, but the code runs without aborting.

Expected Behavior

CMake should correctly identify whether the functions may be used or not.

Although this may be a bigger problem for this case, as CMake just checks if their symbols appear in the MPI library and they seem to be there.

Steps to Reproduce

Compile OpenCoarrays with failed image support, then run a program using coarrays.

zbeekman commented 6 years ago

Hmmm, interesting. It does seem that we need to be a little more rigorous in determining whether the detected features actually work.

For the time being, a work around is to pass -DCAF_ENABLE_FAILED_IMAGES=FALSE when configuring with CMake.

zbeekman commented 6 years ago

The solution to this will be to use introspection to compile and run a test program using the MPIX features. Fun times. This is not a high priority but should be easy to do.

zbeekman commented 5 years ago

The experimental/proposed ULFM features did not make it into MPI 4, furthermore, it was determined that a big re-work is required. As such we have turned off enabling ULFM support by default even when the build system detects it. While this does, in principle, still need fixing, I'm going to close this as "won't fix" until the ULFM implementation stabilizes a bit and we have something reliable to test against.