gramineproject / graphene

Graphene / Graphene-SGX - a library OS for Linux multi-process applications, with Intel SGX support
https://grapheneproject.io
GNU Lesser General Public License v3.0
767 stars 261 forks source link

try_process_exit() should wait for the async helper thread to exit #440

Closed yamahata closed 1 month ago

yamahata commented 5 years ago

This issue is to track the issue found by https://github.com/oscarlab/graphene/pull/321 try_process_exit() should wait for async helper thread to exit and then release their reference count for final resource release. Now there is TODO comment to do so because it requires new PAL interface to wait other thread to exit.

donporter commented 5 years ago

+1

mkow commented 5 years ago

FYI: #321 was just merged to master.

dimakuv commented 5 years ago

Some additional info on a related issue in code.

The termination of IPC Helper Thread checks for the state HELPER_HANDEDOVER (which would be better called HELPER_PENDING_SHUTDOWN). This state is set only via try_process_exit() -> exit_with_ipc_helper() when the process is supposed to exit/shutdown/terminate (think SIGKILL or end of execve).

The benign issue is that termination functionality is unnecessarily duplicated:

int try_process_exit (int error_code, int term_signal) {
... /* code below simplified */
    int ret = exit_with_ipc_helper(true, &ipc_thread);
    shim_clean(ret);
}

static void shim_ipc_helper (void * arg) {  /* in newer commits may be called shim_ipc_helper_end */
...
end:
    COMPILER_BARRIER();
    if (ipc_helper_state == HELPER_HANDEDOVER) {
        debug("ipc helper thread is the last thread, process exiting\n");
        shim_terminate(0); // Same as shim_clean(), but this is the official termination function
    }
...
}

So, try_process_exit() forces the state of IPC helper thread to be HELPER_HANDEDOVER which results in the thread performing shim_terminate(). But try_process_exit() itself is supposed to perform shim_clean() -- so two functions race on who will exit the process.

It's a benign issue but must be fixed to only have one point of process termination. (See also PR #648)

boryspoplawski commented 3 years ago

Update of the issue wrt the current state of Graphene: the issue is still present, but can be fixed with the same manner as #2313 fixed it for IPC worker thread. The problem is that it requires rewriting async thread, which needs to be done, but is not a big priority.