sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.
http://www.opencoarrays.org
BSD 3-Clause "New" or "Revised" License
243 stars 58 forks source link

Issue 762 #763

Closed vehre closed 1 year ago

vehre commented 1 year ago
coverage on master
Codecov branch

Summary of changes

Fix crash on certain platforms on finalize.

Rationale for changes

On finalize opencoarray was MPI_Win_detaching a management structure instead of previous Win_attached token. Detaching the token fixes the crash with openmpi. The testcase provided in the first commit does/can not really test the fix, because the tests do not check for crashing tests.

Additional info and certifications

This pull request (PR) is a:

I certify that

Code coverage data

coverage on master

vehre commented 1 year ago

Intel crashing should be fixed by commit https://github.com/sourceryinstitute/OpenCoarrays/pull/763/commits/9d4afcb44a3414e4165c88b6e6d1b936216b6ac1. At least it does on my Fedora 35 Linux with Intel MPI 2021.6 .

everythingfunctional commented 1 year ago

I can confirm that this did solve the crashing with Intel MPI on Linux, and did improve the situation on Windows. Unfortunately, it does still crash on Windows, now with less severity it seems. The output from Windows is:

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf --show

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" --show
C:/Users/brad/gcc/bin/gfortran.exe -I/c/Users/brad/Repositories/GitHub/sourceryinstitute/opencoarrays-install/include/OpenCoarrays-2.10.0-14-g9d4afcb_GNU-12.1.0 -fcoarray=lib ${@} /c/Users/brad/Repositories/GitHub/sourceryinstitute/opencoarrays-install/lib/libcaf_mpi.a -pthread C:/Program Files (x86)/Intel/oneAPI/mpi/latest/lib/release/impi.lib

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\cafrun --show

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\cafrun" --show
C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -n <number_of_images> /path/to/coarray_Fortran_program [arg4 [arg5 [...]]]

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf hello_coarrays.f90 -o hello_coarrays

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" hello_coarrays.f90 -o hello_coarrays

C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\cafrun -n 4 .\hello_coarrays.exe

C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\cafrun" -n 4 .\hello_coarrays.exe
           1           1
           3           3
           4           4
           2           2

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 6520 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1073740940 (c0000374)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 444 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 2 PID 4796 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 3 PID 8996 RUNNING AT BRADRICHARD5FC1
=   EXIT STATUS: -1 (ffffffff)
===================================================================================
Error: Command:
   `C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -n 4 .\hello_coarrays.exe`
failed to run.

I'm still of the opinion that if solving this last issue is much more effort, we can go ahead and merge this and solve the last thing as a separate PR. Up to you though.

vehre commented 1 year ago

I tried to debug this under Windows, but I see a different error. I see this error:

f:\dd\vctools\crt\crtw32\misc\dbgheap.c(1322) : Assertion failed: _CrtIsValidHeapPointer(pUserData)

This seems to be runtime related and I have no clue how to debug this on windows. So lets merge the existing fixes and if this is important do another round.