Closed shanedsnyder closed 3 years ago
In GitLab by @mdorier on Dec 18, 2019, 04:19
I believe I found the issue: ssg_group_destroy
calls ssg_group_destroy_internal
, which calls swim_finalize
, which ends up blocking on a margo_thread_sleep
. This is because margo_thread_sleep
requires the Mercury progress loop to be running. But finalization callbacks in Margo are called after the progress loop has been terminated, so those callbacks cannot have calls that require the loop to be running (i.e. margo_forward
, or margo_thread_sleep
, etc.).
I think the fix is in Margo rather than SSG: we should have some margo_push_prefinalize_callback
functions to push callbacks that are intended to run before the progress loop is terminated. Those callbacks would allow for some more RPCs or margo_thread_sleep
, but would not guarantee that the process won't receive RPCs meanwhile, contrary to the finalize callbacks, which guarantee that no more RPCs will be received, at the expense that no RPCs or timer can be posted anymore.
I'll add those functions in Margo, retry the SSG test program, and close the issue.
In GitLab by @mdorier on Dec 18, 2019, 07:54
Ok the problem is fixed when using the new margo_push_prefinalize_callback
feature I just added to Margo.
I did a PR to SSG (https://xgitlab.cels.anl.gov/sds/ssg/merge_requests/6) that adds a test of this feature (this PR doesn't have a code for shutting down remotely, though).
I'm closing the issue.
In GitLab by @mdorier on Dec 18, 2019, 07:54
closed
In GitLab by @mdorier on Dec 17, 2019, 13:16
I modified the
ssg-launch-group.c
test in a new branch here: https://xgitlab.cels.anl.gov/sds/ssg/commits/test-finalize-callback In this branch, SSG finalization is done through a Margo finalization callback.If you run it with a single process as follows:
The process is going to correctly shut down after 10 seconds.
However if the shutdown is requested by another process using
margo_shutdown_remote_instance
, the call tossg_group_destroy
in the callback will hang.I didn't include the shutdown program but it's easy enough to write a small C program that takes the address of the process to shutdown, initializes margo, does a lookup of the address, calls
margo_shutdown_remote_instance
, then finalizes.