Closed vehre closed 1 year ago
Intel crashing should be fixed by commit https://github.com/sourceryinstitute/OpenCoarrays/pull/763/commits/9d4afcb44a3414e4165c88b6e6d1b936216b6ac1. At least it does on my Fedora 35 Linux with Intel MPI 2021.6 .
I can confirm that this did solve the crashing with Intel MPI on Linux, and did improve the situation on Windows. Unfortunately, it does still crash on Windows, now with less severity it seems. The output from Windows is:
C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf --show
C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" --show
C:/Users/brad/gcc/bin/gfortran.exe -I/c/Users/brad/Repositories/GitHub/sourceryinstitute/opencoarrays-install/include/OpenCoarrays-2.10.0-14-g9d4afcb_GNU-12.1.0 -fcoarray=lib ${@} /c/Users/brad/Repositories/GitHub/sourceryinstitute/opencoarrays-install/lib/libcaf_mpi.a -pthread C:/Program Files (x86)/Intel/oneAPI/mpi/latest/lib/release/impi.lib
C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\cafrun --show
C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\cafrun" --show
C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -n <number_of_images> /path/to/coarray_Fortran_program [arg4 [arg5 [...]]]
C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\caf hello_coarrays.f90 -o hello_coarrays
C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\caf" hello_coarrays.f90 -o hello_coarrays
C:\Users\brad\Repositories\GitHub\sourceryinstitute>opencoarrays-install\bin\cafrun -n 4 .\hello_coarrays.exe
C:\Users\brad\Repositories\GitHub\sourceryinstitute>"C:/Program Files/Git/usr/bin/bash.exe" "C:\Users\brad\Repositories\GitHub\sourceryinstitute\opencoarrays-install\bin\cafrun" -n 4 .\hello_coarrays.exe
1 1
3 3
4 4
2 2
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 6520 RUNNING AT BRADRICHARD5FC1
= EXIT STATUS: -1073740940 (c0000374)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 444 RUNNING AT BRADRICHARD5FC1
= EXIT STATUS: -1 (ffffffff)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 4796 RUNNING AT BRADRICHARD5FC1
= EXIT STATUS: -1 (ffffffff)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 8996 RUNNING AT BRADRICHARD5FC1
= EXIT STATUS: -1 (ffffffff)
===================================================================================
Error: Command:
`C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -n 4 .\hello_coarrays.exe`
failed to run.
I'm still of the opinion that if solving this last issue is much more effort, we can go ahead and merge this and solve the last thing as a separate PR. Up to you though.
I tried to debug this under Windows, but I see a different error. I see this error:
f:\dd\vctools\crt\crtw32\misc\dbgheap.c(1322) : Assertion failed: _CrtIsValidHeapPointer(pUserData)
This seems to be runtime related and I have no clue how to debug this on windows. So lets merge the existing fixes and if this is important do another round.
Summary of changes
Fix crash on certain platforms on finalize.
Rationale for changes
On finalize opencoarray was MPI_Win_detaching a management structure instead of previous Win_attached token. Detaching the token fixes the crash with openmpi. The testcase provided in the first commit does/can not really test the fix, because the tests do not check for crashing tests.
Additional info and certifications
This pull request (PR) is a:
I certify that
Code coverage data