openshmem-org / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
51 stars 40 forks source link

Specify interoperability with fork and subprocess creation #474

Open nspark opened 3 years ago

nspark commented 3 years ago

This PR specifies interoperability with fork and subprocess creation via posix_spawn.

Feedback is very much requested.

Closes #223

nspark commented 3 years ago

In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:

When fork is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.

wrrobin commented 3 years ago

In my testing, an application that holds to the following clause works as expected for both Cray SHMEM on Aries and OSHMEM/UCX on InfiniBand:

When fork is invoked before the OpenSHMEM library is initialized, only one of either the parent or child processes may initialize the OpenSHMEM library.

Yes, makes sense. For the other clause, if fork is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations until shmem_finalize(). If the child process is killed before the shmem_finalize, will that be allowed for both child and parent to invoke SHMEM calls?

nspark commented 3 years ago

For the other clause, if fork is invoked within the OpenSHMEM region, then the behavior is defined and both the parent and child can invoke SHMEM operations until shmem_finalize(). If the child process is killed before the shmem_finalize, will that be allowed for both child and parent to invoke SHMEM calls?

No, the second clause says (emphasis added):

When fork is invoked within the OpenSHMEM portion of the program or after the OpenSHMEM library has been finalized, the newly created child process shall not call any OpenSHMEM routines; otherwise, the behavior is undefined.

So, no; the child process could not invoke SHMEM operations.

wrrobin commented 3 years ago

Thanks @nspark. The text mentioned system level interfaces, so I assume, user level threads / processes do not fall under this restriction.

After finalize, a call to any OpenSHMEM API would lead to undefined behavior, from both child and parent processes. I dont see the current spec has any such statement other than saying that finalize must be the last OpenSHMEM call.

jdinan commented 3 years ago

@nspark I realize we have deprecated atexit finalization of OpenSHMEM libraries, but I am wondering about the interactions. When a child process is forked, it inherits the atexit handlers of the parent process. The OpenSHMEM library can record the PID of the process that calls shmem_init and only finalize the library when the process with this PID invokes the atexit handler. However, this could break code that assumes the child process can continue to use OpenSHMEM until it exits. One solution is to remove start_pes and implicit finalization. However, I think some OpenSHMEM libraries may still use implicit finalization even though we have added an explicit finalize call.

nspark commented 3 years ago

@jdinan I wonder whether pthread_atfork could be relevant here as a way for a library to safely disable its atexit handler in the child process. Of course, it can't dequeue the handler itself, but it could atomically set a flag that gets queried in the handler before invoking the internal library finalization.

nspark commented 3 years ago

@jdinan Re: implicit finalization, what about adding something like the following under "Notes to implementers"?

OpenSHMEM implementations that support implicit library finalization for compatibility with start_pes should ensure that child processes created after library initialization do not implicitly call OpenSHMEM operations as part of exit handlers invoked during normal process termination.

jdinan commented 3 years ago

@nspark This is a useful note to implementors. The start_pes function and implicit finalization are still required for implementations to be conformant with the spec. Can we also remove these functions (in addition to adding this note about backward compatibility) so that implementors don't need to deal with this case?

nspark commented 2 years ago

@jdinan Just to clarify, by "remove these functions," you mean totally gone from the API and not just deprecated, correct?

jdinan commented 2 years ago

Note that this would require OpenSHMEM libraries using IB Verbs to call ibv_fork_init: https://linux.die.net/man/3/ibv_fork_init