Closed naughtont3 closed 3 years ago
All OpenMPI versions I tried, including 3.x, 4.x, and the latest github master fail, although not sure if the problem is exactly the same in all cases. I only debugged the latest master and 4.0.2, where I found that the error occurs in /ompi/mca/osc/rdma/osc_rdma_dynamic.c in function ompi_osc_rdma_detach because ompi_osc_rdma_find_region_containing() does not find the dynamic memory region, but it should still exist (I checked the application code to the best I could).
Can you make a smaller reproducer, perchance?
I will use the one provided. Will quickly delete gfortran once I am done :p. Keep in mind that in some implementations attach/detach are essentially no-ops so success with another implementation does not necessarily mean there is a bug in Open MPI.
But given that attach/detach test coverage is incomplete it would not be surprising if there is a bug.
Making a smaller reproducer would be extremely hard due to the code architecture, but I can assist with code navigation and debugging. In particular, there is only one source of MPI_Win_attach() in DDSS/distributed.F90 (subroutine DataWinAttach) and there is only one source of MPI_Win_detach() in DDSS/distributed.F90 (subroutine DataWinDetach). Similarly, all other MPI-3 RMA one-sided functionality is located in DDSS/distributed.F90. MPI dynamic windows are used with MPI_Rget() and MPI_Raccumulate() performing communications from within a PARALLEL OpenMP region, synchronized via MPI_Test() (inside the MPI_Win_lock and MPI_Win_unlock epoch).
I am also trying to double check that my use of MPI-3 is valid, which I believe it is, but still there is a chance I am missed something.
f951: internal compiler error: in generate_finalization_wrapper, at fortran/class.c:1993
Please submit a full bug report,
with preprocessed source if appropriate.
You are likely not using gcc/8.x.
Only gfortran/8.x works, other versions have compiler bugs, including gfortran/9.x.
I have just checked again and can confirm that the application is trying to detach a valid (previously attached) region which has not been detached before. Moreover, this is neither the first Detach call nor the last, if this can help. Also, no error code is returned from MPI_Win_detach() because of the crash, so I would assume the Open MPI would have returned an error code instead of crashing in case this issue was on the application side, right?
The problem occurs because the program attaches multiple regions that overlap least one page (minimum hardware registration unit). Taking a look at one crash:
Both of these regions contain the 4k page at 0x151090000000. Ideally osc/rdma should have returned an error as the implementation treats page overlap as region overlap. Overlapping regions are not allowed by the standard. The standard does not, however, give guidance on what overlapping means so it is fair to assume page-level overlap is allowed. This may be an error in the standard as the implementation is usually free to set restrictions based on hardware characteristics. I would very strongly recommend against 1) page overlapped regions, and 2) small regions (an 8 byte attach is incredibly wasteful). I will see about either allowing this kind of overlap or at least returning the proper error from MPI_Win_attach.
Both of these are trivial to implement, just want to make sure we follow the intended behavior (standard may be wrong here).
Ha, this is subtle, but makes total sense from the implementation point of view. Thanks for such a quick investigation! I have always interpreted the standard as non-overlapping virtual ranges. On the other hand, can we safely assume that a 4K-byte page is always the minimum hardware registration unit on all systems, because otherwise the MPI standard will introduce a non-portable restriction? In any case, the error code and message from MPI_Win_attach would definitely help here. Thanks.
I have just built the branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap from https://github.com/hjelmn/ompi, but it results in exactly the same problem as before, even if I specify --mca osc_rdma_max_attach 128 in mpirun. Am I testing the wrong branch/commit? Commit I have is ec331c7998f84924713b7d1a97422ed7561cee7b Author: Nathan Hjelm hjelmn@google.com Date: Tue Feb 11 21:57:24 2020 -0800
@hjelmn @DmitryLyakh I did a test with cherry-pick of #7383 and #7387 to point just prior to current PRRTE (no-orte) changes due to unrelated problems. For clarity, I pushed the branch I tested here: https://github.com/naughtont3/ompi/tree/pre-NoRTE-plus-oscrdma-fix
Things fail later, but there is still a failure during the detach with SEGV while inside ompi_osc_rdma_remove_attachment(). A debug log with osc_base_verbose and osc_rdma_verbose enabled is attached.
I noticed that had omp threads=8, so I re-ran with OMP_NUM_THREADS=1
for slightly simpler case. It fails to find a memory attachment and throws the error during detach.
It seems like the rdma_region_handle
is ompi_osc_rdma_detach()
is NULL. I am not sure why When that happens and you see "could not find dynamic memory attachment" it returns OMPI_ERR_BASE
, which triggers the MPI win error handler for errors-are-fatal. So question is why/where does the rdma_region_handle
get off?
Think I found the issue. Please try it again.
This seems better for my test. But I will have to check with Dmitry to ensure it behaves as expected for him.
I am unable to test this as I cannot find which branch/commit I am supposed to try. The last commit I see in Nathan's repository on branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap is dated Feb 11. Where is the latest commit with today's fix?
This is what I see as the latest commit (Feb 11): commit 96ed63026033271a1d253776e3fbc8a573f91d10 Author: Nathan Hjelm hjelmn@google.com Date: Tue Feb 11 21:57:24 2020 -0800
osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References #7384
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Forced pushed the branch. Just reclone it. You will probably need to set the max attach to 256 or higher for your app.
I re-cloned https://github.com/hjelmn/ompi.git, checked out branch osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap, commit 96ed63026033271a1d253776e3fbc8a573f91d10. And it produces exactly the same crash in MPI_Win_detach as before (below) on my Ubuntu 16.04 laptop. Any ideas? Printing scalar etens ... etens()[] 0.10668566847503D+15 Ok: 0.1063 sec Retrieving directly scalar etens ... Ok: Value = ( 0.10668566847503D+15 0.00000000000000D+00): 0.0560 sec Retrieving directly tensor dtens ... Ok: Norm = 0.10668566847502D+15: 0.3408 sec Destroying tensor rtens ... [Incredible:00000] An error occurred in MPI_Win_detach [Incredible:00000] reported by process [3374317570,1] [Incredible:00000] on win rdma window 5 [Incredible:00000] MPI_ERR_UNKNOWN: unknown error [Incredible:00000] MPI_ERRORS_ARE_FATAL (processes in this win will now abort, [Incredible:00000] and potentially your MPI job) [Incredible:00000] An error occurred in MPI_Win_detach [Incredible:00000] reported by process [3374317570,0] [Incredible:00000] on win rdma window 5 [Incredible:00000] MPI_ERR_UNKNOWN: unknown error [Incredible:00000] MPI_ERRORS_ARE_FATAL (processes in this win will now abort, [Incredible:00000] and potentially your MPI job) [Incredible:23436] PRUN: EVHANDLER WITH STATUS PMIX_ERR_JOB_TERMINATED(-145) [Incredible:23436] JOB [51488,2] COMPLETED WITH STATUS -1 [Incredible:23436] PRUN: INFOCB [Incredible:23436] PRUN: EVHANDLER WITH STATUS LOST_CONNECTION_TO_SERVER(-101)
I tried both 128 and 256 max limit, no difference ... However, Thomas somehow made it work on hist desktop. I will try on Summit as well later, but on my laptop I observe no difference, it always crashes the same way as originally reported ...
This is the mpiexec command I used on my laptop: /usr/local/mpi/openmpi/openmpi-hjelmn/bin/mpiexec -np 4 -npernode 4 --hostfile hostfile --verbose --mca mpi_abort_print_stack 1 --mca osc_rdma_max_attach 256 ./exec.sh
hostfile: localhost slots=4
Yes, I think that his the same scenario b/c I see MPI_ERR_UNKNOWN
, whereas on my desktop if I exclude the --mca osc_rdma_max_attach 128
it fails during MPI_Win_attach
with error MPI_ERR_RMA_ATTACH
.
Well, I can no longer reproduce the issue. I committed the changes to master so you can go ahead and try that and see what you get.
Do you mean the test code I provided runs to completion without a crash in MPI_Win_detach() in your case? I built and tested the latest commit (below) from github.com/hjelmn/ompi and still getting the same crash on my Ubuntu 16.04 machine ...
commit 54c8233f4f670ee43e59d95316b8dc68f8258ba0 Author: Nathan Hjelm hjelmn@google.com Date: Sun Feb 16 17:09:20 2020 -0800
osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Thomas, does the latest commit pass the test on your desktop?
Yes on my desktop (not tested on Summit yet).
I pulled OMPI master with Nathan's changes merged and rebuilt on my desktop.
https://github.com/open-mpi/ompi.git
My configure line...
beaker:$ gcc --version | head -1
gcc (Spack GCC) 8.1.0
beaker:$ ../configure \
--enable-mpirun-prefix-by-default \
--enable-debug \
--prefix=$PWD/_install \
&& make \
&& make install
beaker:$ git remote -v
origin https://gitlab.com/DmitryLyakh/ExaTensor.git (fetch)
origin https://gitlab.com/DmitryLyakh/ExaTensor.git (push)
beaker:$ git br
master
* openmpi_fail
beaker:$ git log --oneline | head -2
bf8a46e Prepared the Makefile for reproducing the OpenMPI crash in MPI_Win_detach().
68e2a37 Added ddss_flush_all() in DDSS.
mpirun \
--np 4 \
--mca osc rdma \
--mca mpi_abort_print_stack 1 \
--mca osc_rdma_max_attach 128 \
-x OMP_NUM_THREADS=8 \
-x QF_NUM_PROCS=4 \
-x QF_PROCS_PER_NODE=4 \
-x QF_CORES_PER_PROCESS=1 \
-x QF_MEM_PER_PROCESS=1024 \
-x QF_NVMEM_PER_PROCESS=0 \
-x QF_HOST_BUFFER_SIZE=1024 \
-x QF_GPUS_PER_PROCESS=0 \
-x QF_MICS_PER_PROCESS=0 \
-x QF_AMDS_PER_PROCESS=0 \
-x QF_NUM_THREADS=1 \
./Qforce.x
Looks to me like it runs to completion. I am running with Open MPI master with only setting the osc_rdma_max_attach MCA variable to 1024 (just to be safe).
Just in case, did you build OpenMPI in Debug or Release mode?
debug mode. shouldn't make a difference but I can try again in optimized mode.
shouldn't but is. huh
ok, I see the issue. had the call inside an assert and we optimize assert out in non-debug builds. fixing.
See also PR #7421
@DmitryLyakh I think I worked through most of the issues I was hitting on Summit. I am now able to run your reproducer w/o error on Summit using the "DEVELOP" openmpi/master
build. This is using gcc/8.1.1
module and ompi master at 960c5f7. Please, give this a shot and see how things are working for you.
Confirmed on my desktop: The OpenMPI master branch works fine in the Release mode after PR #7421
Thanks for fixing this! I guess this issue can be closed now.
@naughtont3 could you boil this down to a small reproducer we can put into the ibm test suite?
I can add one. Just need to attach and detach a bunch.
@hppritcha OK, I'll sync with @DmitryLyakh and get it into a test batch.
@hjelmn if you can write simple unit test, that would be easier than having full application case. i'll try to get a version of @DmitryLyakh code into a test somewhere, but your unit test would be an easier case for most folks to quickly test. Thx
@naughtont3 @hjelmn Where are we on this issue? Did @cniethammer's cherry pick fix the issue on v4.0.x and/or v4.1.x?
@naughtont3 could you check if this is still a problem master and release branches?
closing. reopen if this is still a problem.
Background information
Application failure with Open MPI with one sided communication (OSC).
Reporting on behalf of user to help track problem.
The test works fine with MPICH/3.x, Spectrum MPI, and Intel MPI.
What version of Open MPI are you using?
Describe how Open MPI was installed
Standard tarball build can reproduce, using gcc/8.x compiler suite (gcc-8, g++-8, gfortran-8). Need gcc > 8.x to avoid past gfortran bugs.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Reproducible on any Linux workstation (Need gcc > 8.x to avoid past gfortran bugs)
Details of the problem
You should be able to reproduce this failure on any Linux workstation. The only thing you need to make sure is to use the gcc/8.x compiler suite (gcc-8, g++-8, gfortran-8) since other versions are buggy in their gfortran part.
I added
.txt
extension to attach to github ticket.Normally
run.exatensor.sh
invokesmpiexec
with the binary directly, but for some reason thempiexec
from the latest GIT master branch fails to load some dynamic libraries (libgfortran), so I introduced a workaround whererun.exatensor.sh
invokesmpiexec
withexec.sh
, which in turn executes the binaryQforce.x
. Previous OpenMPI versions did not have this issue by the way. But all of them fail inMPI_Win_detach
as you can see below: