Open nspark opened 3 years ago
In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:
When
fork
is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.
In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:
When
fork
is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.
Yes, makes sense.
For the other clause, if fork
is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations until shmem_finalize()
. If the child process is killed before the shmem_finalize
, will that be allowed for both child and parent to invoke SHMEM calls?
For the other clause, if
fork
is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations untilshmem_finalize()
. If the child process is killed before theshmem_finalize
, will that be allowed for both child and parent to invoke SHMEM calls?
No, the second clause says (emphasis added):
When
fork
is invoked within the OpenSHMEM portion of the program or after the OpenSHMEM library has been finalized, the newly created child process shall not call any OpenSHMEM routines; otherwise, the behavior is undefined.
So, no; the child process could not invoke SHMEM operations.
Thanks @nspark. The text mentioned system level interfaces, so I assume, user level threads / processes do not fall under this restriction.
After finalize
, a call to any OpenSHMEM API would lead to undefined behavior, from both child and parent processes. I dont see the current spec has any such statement other than saying that finalize
must be the last OpenSHMEM call.
@nspark I realize we have deprecated atexit
finalization of OpenSHMEM libraries, but I am wondering about the interactions. When a child process is forked, it inherits the atexit
handlers of the parent process. The OpenSHMEM library can record the PID of the process that calls shmem_init
and only finalize the library when the process with this PID invokes the atexit
handler. However, this could break code that assumes the child process can continue to use OpenSHMEM until it exits. One solution is to remove start_pes
and implicit finalization. However, I think some OpenSHMEM libraries may still use implicit finalization even though we have added an explicit finalize call.
@jdinan I wonder whether pthread_atfork
could be relevant here as a way for a library to safely disable its atexit
handler in the child process. Of course, it can't dequeue the handler itself, but it could atomically set a flag that gets queried in the handler before invoking the internal library finalization.
@jdinan Re: implicit finalization, what about adding something like the following under "Notes to implementers"?
OpenSHMEM implementations that support implicit library finalization for compatibility with
start_pes
should ensure that child processes created after library initialization do not implicitly call OpenSHMEM operations as part of exit handlers invoked during normal process termination.
@nspark This is a useful note to implementors. The start_pes
function and implicit finalization are still required for implementations to be conformant with the spec. Can we also remove these functions (in addition to adding this note about backward compatibility) so that implementors don't need to deal with this case?
@jdinan Just to clarify, by "remove these functions," you mean totally gone from the API and not just deprecated, correct?
Note that this would require OpenSHMEM libraries using IB Verbs to call ibv_fork_init
: https://linux.die.net/man/3/ibv_fork_init
This PR specifies interoperability with
fork
and subprocess creation viaposix_spawn
.Feedback is very much requested.
Closes #223