Closed harrysonj97 closed 6 years ago
Looks like there's 2 problems here:
export TMPDIR=/tmp
to avoid this problem.[JSQUYRES-M-26UT:87388] [[3244,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/show_help.c at line 507
[JSQUYRES-M-26UT:87388] [[3244,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space in file util/show_help.c at line 507
mpirun(87388,0x7fff9df4a380) malloc: *** mach_vm_map(size=18446744073392484352) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[JSQUYRES-M-26UT:87388] [[3244,0],0] ORTE_ERROR_LOG: Out of resource in file util/show_help.c at line 507
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: JSQUYRES-M-26UT
System call: unlink(2) /tmp/ompi.JSQUYRES-M-26UT.504/pid.87388/1/vader_segment.JSQUYRES-M-26UT.cac0001.5
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
mpirun(87388,0x70000226f000) malloc: *** mach_vm_map(size=1125899906846720) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[JSQUYRES-M-26UT:87388] *** Process received signal ***
[JSQUYRES-M-26UT:87388] Signal: Segmentation fault: 11 (11)
[JSQUYRES-M-26UT:87388] Signal code: Address not mapped (1)
[JSQUYRES-M-26UT:87388] Failing at address: 0x0
[JSQUYRES-M-26UT:87388] [ 0] 0 libsystem_platform.dylib 0x00007fff65744f5a _sigtramp + 26
[JSQUYRES-M-26UT:87388] [ 1] 0 ??? 0x0000000005d608a8 0x0 + 97913000
[JSQUYRES-M-26UT:87388] [ 2] 0 mca_rml_oob.so 0x000000010949edac orte_rml_oob_send_buffer_nb + 988
[JSQUYRES-M-26UT:87388] [ 3] 0 libopen-rte.40.dylib 0x000000010911ef08 pmix_server_log_fn + 472
[JSQUYRES-M-26UT:87388] [ 4] 0 mca_pmix_pmix2x.so 0x00000001092eb75d server_log + 925
[JSQUYRES-M-26UT:87388] [ 5] 0 mca_pmix_pmix2x.so 0x00000001093266c6 pmix_server_log + 1302
[JSQUYRES-M-26UT:87388] [ 6] 0 mca_pmix_pmix2x.so 0x0000000109315aff server_message_handler + 4959
[JSQUYRES-M-26UT:87388] [ 7] 0 mca_pmix_pmix2x.so 0x0000000109355066 pmix_ptl_base_process_msg + 774
[JSQUYRES-M-26UT:87388] [ 8] 0 libopen-pal.40.dylib 0x00000001091ef89a opal_libevent2022_event_base_loop + 1706
[JSQUYRES-M-26UT:87388] [ 9] 0 mca_pmix_pmix2x.so 0x000000010932ce6e progress_engine + 30
[JSQUYRES-M-26UT:87388] [10] 0 libsystem_pthread.dylib 0x00007fff6574e661 _pthread_body + 340
[JSQUYRES-M-26UT:87388] [11] 0 libsystem_pthread.dylib 0x00007fff6574e50d _pthread_body + 0
[JSQUYRES-M-26UT:87388] [12] 0 libsystem_pthread.dylib 0x00007fff6574dbf9 thread_start + 13
[JSQUYRES-M-26UT:87388] *** End of error message ***
[1] 87388 segmentation fault (core dumped) mpirun --oversubscribe -np 16 hello_c
No, I haven't seen that anywhere before - do you know at what point in the program this happens?
Strange indeed. export TMPDIR=/tmp allowed me to mpirun --oversubscribe -np 10 hello but if i increase it to 20 i get the same error.
@ggouaillardet's post may be relevant here: https://www.mail-archive.com/devel@lists.open-mpi.org/msg20760.html
From my analysis, here is what happens :
- each rank is supposed to have its own vader_segment unlinked by btl/vader in vader_finalize().
- but this file might have already been destroyed by an other task in orte_ess_base_app_finalize()
if (NULL == opal_pmix.register_cleanup) { orte_session_dir_finalize(ORTE_PROC_MY_NAME); }
all the tasks end up removing opal_os_dirpath_destroy("/tmp/ompi.c7.1000/pid.23941/1")
I am not really sure about the best way to fix this.
- one option is to perform an intra node barrier in vader_finalize()
- an other option would be to implement an opal_pmix.register_cleanup
Any thoughts ?
I thought we had fixed this by implementing the register_cleanup option, but maybe it didn't get to the v3.x release branches?
I reproduced the issue with the latest master and embedded PMIx.
A workaround could be to mpirun --mca btl_vader_backing_directory /tmp ...
Hmmm...let me check master and ensure that the OPAL wrapper function didn't get lost somewhere. There should be no way that orte_session_dir_finalize
call got executed on master.
Confirmed - that function pointer is definitely not NULL
, so that function is never called.
I am pretty sure it was NULL
for me. I will double check that tomorrow.
You checked that on the MPI app side (e.g. not mpirun
nor orted
) right ?
I just looked at the opal/pmix code and confirmed that (a) there is a function entry in the opal_pmix module and (b) the required "glue" code is present. Thus, if you are using the internal PMIx code, that function pointer cannot be NULL
.
You might check to verify you didn't configure for an external (older) version of PMIx, just to be safe?
@rhc54 well, we are both right, kind of ...
from orte_ess_base_app_finalize()
if (NULL != opal_pmix.finalize) {
opal_pmix.finalize();
(void) mca_base_framework_close(&opal_pmix_base_framework);
}
(void) mca_base_framework_close(&orte_oob_base_framework);
(void) mca_base_framework_close(&orte_state_base_framework);
if (NULL == opal_pmix.register_cleanup) {
orte_session_dir_finalize(ORTE_PROC_MY_NAME);
closing the PMIx
framework (re)sets opal_pmix.register_cleanup
to NULL
(it used to be pmix4x_register_cleanup
), that is why orte_session_dir_finalize()
is always invoked.
The inline patch below fixes this, can you please review it ?
diff --git a/orte/mca/ess/base/ess_base_std_app.c b/orte/mca/ess/base/ess_base_std_app.c
index a02711f..52eaee0 100644
--- a/orte/mca/ess/base/ess_base_std_app.c
+++ b/orte/mca/ess/base/ess_base_std_app.c
@@ -13,7 +13,7 @@
* Copyright (c) 2011-2013 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013-2018 Intel, Inc. All rights reserved.
- * Copyright (c) 2014-2016 Research Organization for Information Science
+ * Copyright (c) 2014-2018 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2015 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2018 Mellanox Technologies, Inc.
@@ -320,6 +320,7 @@ int orte_ess_base_app_setup(bool db_restrict_local)
int orte_ess_base_app_finalize(void)
{
+ bool orte_cleanup = (NULL == opal_pmix.register_cleanup);
/* release the conduits */
orte_rml.close_conduit(orte_mgmt_conduit);
orte_rml.close_conduit(orte_coll_conduit);
@@ -341,7 +342,7 @@ int orte_ess_base_app_finalize(void)
(void) mca_base_framework_close(&orte_oob_base_framework);
(void) mca_base_framework_close(&orte_state_base_framework);
- if (NULL == opal_pmix.register_cleanup) {
+ if (orte_cleanup) {
orte_session_dir_finalize(ORTE_PROC_MY_NAME);
}
/* cleanup the process info */
Why not just close the pmix framework a little later? It shouldn't be closed until after all of ORTE has finalized.
I think you know better than me :-)
my concern is if there is no register_cleanup()
, orte_session_dir_dinalize()
might delete some files used by PMIx
. If you tell me no such thing can ever occur, then yes, simply close the PMIx
framework after all orte
has finalized.
I'll be happy to issue a PR based on your directions.
Not sure what it is that I should "know better", but I think this is pretty simple to resolve. I'll ponder it a little after I finish the current work. I'm not wild about this proposed fix as I think the issue might well persist.
I apologize I chose my words poorly.
I did not mean anything malicious and wanted to say I do not know and you know better (than me) where to close the pmix framework, so I leave it up to you.
Thanks
I didn't interpret it as anything mean - I just didn't understand it, that's all. Let me try to capture the scenarios here so perhaps you can move forward before I have time to address it. PMIx and HWLOC both have shared memory files in the session directory, but they are at the daemon's level and shouldn't be impacted by the apps. Cleanup in general has two major use-cases to consider:
where the apps are launched by mpirun (aka, "indirect" launch). In this case, the orteds will take care of cleaning up the session directory and the app procs themselves should do nothing. Vader files placed outside the session directory may not get cleaned up if Vader itself fails to do so (see below).
where the apps are directly launched against the resource manager (e.g., via "srun"), which we call "direct" launch. In this case, the RM daemon may assign a session directory and clean it up on our behalf. We detect that scenario in the ess/pmi module. However, this case also has the issue of Vader files placed outside the session directory, and we have to deal with the case where the RM doesn't assign or cleanup the session directory.
Dealing with the session directory itself in the direct launch case where the RM doesn't provide cleanup requires that the app procs call orte_session_dir_finalize
. This is the only time the apps should do so. The function already checks for RM cleanup and so it is safe to call in either case. Thus, the correct fix here is to (a) check for direct launch (ORTE_SCHIZO_DIRECT_LAUNCHED == orte_schizo.check_launch_environment()
) and if true, then (b) call orte_session_dir_finalize
. You can finalize PMIx first or not - shouldn't matter.
This still leaves the issue of the Vader files placed outside the session dir. Unfortunately, checking to see if opal_pmix.register_cleanup
is NULL
isn't sufficient in itself - the PMIx client library simply relays any cleanup registration to the local PMIx server, which may or may not support that feature. For example, the current Slurm PMIx plugin does not support it, so even though the OMPI function may be non-NULL, cleanup registration will fail. This will return an error code (in opal/mca/pmix/pmix4x/pmix4x.c), but we don't currently save it.
The reason we don't bother to save it is, quite simply, that the app can't do anything about it. Vader will already try to remove its files - knowing that registration failed doesn't tell the app anything new. Registration only provides a bit of backup for those cases where the app fails to remove the file due to some internal issue.
The only solution I can think of would be to have opal/pmix return the registration status code. If Vader sees that registration fails, then perhaps it should fall back to placing the backing file in the session directory to ensure it gets cleaned up by other local app procs when they call orte_session_dir_finalize
.
HTH
@rhc54 The issue still exists in 4.0.1 on illumos even if I apply this patchset. Any idea how I can diagnose and provide relevant feedback to you?
I'm afraid you'll have to tell me more - precisely what issue are you talking about? What patch did you apply?
Sorry for the lack of details. I way refering to issues such as:
narval> mpirun -n 10 ./a.out
Hello world from processor rank 3 of 10
Hello world from processor rank 2 of 10
Hello world from processor rank 8 of 10
Hello world from processor rank 0 of 10
Hello world from processor rank 9 of 10
Hello world from processor rank 7 of 10
Hello world from processor rank 4 of 10
Hello world from processor rank 5 of 10
Hello world from processor rank 6 of 10
Hello world from processor rank 1 of 10
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have. It is likely that your MPI job will now either abort or
experience performance degradation.
Local host: narval
System call: unlink(2) /tmp/ompi.narval.101/pid.28219/1/vader_segment.narval.84960001.1
Error: No such file or directory (errno 2)
--------------------------------------------------------------------------
[narval:28219] 2 more processes have sent help message help-opal-shmem-mmap.txt / sys call fail
[narval:28219] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
which is at least similar to the one reported above.
I applied the patch at https://github.com/open-mpi/ompi/commit/c076be52afc19b1a8c1884ff7e66b04122c7ab23
but also tried the workaround suggested by @ggouaillardet
Let me know what kind of output would be useful (truss, dtrace, ...).
Kind regards,
Aurélien
I'd suggest trying the latest nightly tarball of the 4.0.x branch as the fix may have already been committed there:
The issue is still present in openmpi-v4.0.x-201908090241-6d62fb0, should I follow up in another ticket?
Probably best to do so - I'm out of ideas.
Hit this issue in v4.0.3 (https://github.com/conda-forge/openmpi-feedstock/pull/58).
I've wrote a simple Hello World program on C and i couldn't seem to execute the code using Open MPI up to more than 5 processes.
I'm using the latest version of OpenMPI, 3.1.2 and I've installed it on my mac by following this tutorial: https://intothewave.wordpress.com/2011/12/27/install-open-mpi-on-mac-os-x/
The problem is, even with --oversubscribe used i get an error message in the end.
Here's my C code:
and i run it on my terminal:
Output:
Would really appreciate some help on this
Update: If i run the command:
it works without the error but am still wondering if there's any possible fix to execute the usual command without any errors?