llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.59k stars 11.82k forks source link

[BOLT] bolt_rt -instrumentation-sleep-time hangs when run under lit #78419

Open peterwaller-arm opened 9 months ago

peterwaller-arm commented 9 months ago

If using llvm-bolt -instrument -instrumentation-sleep-time=N for bolt instrumentation, llvm-lit hangs during profile collection.

The reason is that lit uses subprocess.Popen.communicate, which blocks until stdio is closed, and bolt_rt is forking watchProcess without closing file descriptors. watchProcess() then probes for when the PID of the parent process becomes invalid (indicating the workload has finished and been waited on), which it does not, because the process remains as a zombie until it is wait()-ed on by the parent process, which would ordinarily follow the Popen.communicate. This leads to a deadlock.

Quick and dirty reproducer to show the issue:

clang -x c - <<<$'#include <unistd.h>\nint main(int argc, char *argv[]) { int x = fork(); if (x == 0) { /* close(1); */ sleep(2); } return 0; }'
time ./a.out
// instant
time python3 -c 'from subprocess import Popen, PIPE; import sys; p = Popen(sys.argv[1], stdout=PIPE); p.communicate()' ./a.out
// blocks for 2 seconds because stdout is still open; removing comments around close(1) causes it to not block.

Relevant code fragments:

Forking and leaving stdio file descriptors open:

https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/bolt/runtime/instr.cpp#L1655-L1657

lit using communicate()

https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/llvm/utils/lit/lit/TestRunner.py#L917-L919

watchProcess() querying the parent process

https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/bolt/runtime/instr.cpp#L1589

The quick fix to prevent the deadlock would be to close file descriptors 0,1,2 in the fork. However this brings to mind a second issue: there is nothing synchronizing the outer program with the completion of profile writing. This means that a build system invoking lit to gather and then use the profile data could continue onward to using the data before it has been written out. So closing the file descriptors would replace a hang with a race condition. Not ideal.

The way to avoid this would be to have watchProcess() be the parent process which waits for its fork to complete before writing the final profile and exiting itself; however from the documentation of instrumentation-sleep-time I see it's intended to allow the profile to be written out even if the parent process is killed (in which case the instrumentation would not be able to react to a signal), so it might be written this way intentionally. If the parent is being killed for example because of OOM, it would still work with the watchProcess as the parent, but if it's being killed because it is the named process, the profile would be lost.

As an aside, I'm using instrumentation-sleep-time because the binary in question is statically linked and therefore so far as I know, it was unable to use the usual mechanism for writing out the profile on exit (via DT_FINI? I'm hazy on the details here). If I missed something and there is a way to make it work with static binaries then I can use that without needing instrumentation-sleep-time.

/cc @aaupov and potential authors of watchProcess() @rafaelauler @yota9 @maksfb

llvmbot commented 9 months ago

@llvm/issue-subscribers-bolt

Author: Peter Waller (peterwaller-arm)

If using `llvm-bolt -instrument -instrumentation-sleep-time=N` for bolt instrumentation, `llvm-lit` hangs during profile collection. The reason is that lit uses `subprocess.Popen.communicate`, which blocks until stdio is closed, and bolt_rt is forking `watchProcess` without closing file descriptors. `watchProcess()` then probes for when the PID of the parent process becomes invalid (indicating the workload has finished and been waited on), which it does not, because the process remains as a zombie until it is `wait()`-ed on by the parent process, which would ordinarily follow the `Popen.communicate`. This leads to a deadlock. Quick and dirty reproducer to show the issue: ``` clang -x c - <<<$'#include <unistd.h>\nint main(int argc, char *argv[]) { int x = fork(); if (x == 0) { /* close(1); */ sleep(2); } return 0; }' time ./a.out // instant time python3 -c 'from subprocess import Popen, PIPE; import sys; p = Popen(sys.argv[1], stdout=PIPE); p.communicate()' ./a.out // blocks for 2 seconds because stdout is still open; removing comments around close(1) causes it to not block. ``` Relevant code fragments: #### Forking and leaving stdio file descriptors open: https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/bolt/runtime/instr.cpp#L1655-L1657 #### lit using communicate() https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/llvm/utils/lit/lit/TestRunner.py#L917-L919 #### watchProcess() querying the parent process https://github.com/llvm/llvm-project/blob/94da2b21ee3f2baed729333ec8bbf96f92c1fa84/bolt/runtime/instr.cpp#L1589 The quick fix to prevent the deadlock would be to close file descriptors 0,1,2 in the fork. However this brings to mind a second issue: there is nothing synchronizing the outer program with the completion of profile writing. This means that a build system invoking lit to gather and then use the profile data could continue onward to using the data before it has been written out. So closing the file descriptors would replace a hang with a race condition. Not ideal. The way to avoid this would be to have `watchProcess()` be the parent process which waits for its fork to complete before writing the final profile and exiting itself; however from the documentation of `instrumentation-sleep-time` I see it's intended to allow the profile to be written out even if the parent process is killed (in which case the instrumentation would not be able to react to a signal), so it might be written this way intentionally. If the parent is being killed for example because of OOM, it would still work with the `watchProcess` as the parent, but if it's being killed because it is the named process, the profile would be lost. As an aside, I'm using `instrumentation-sleep-time` because the binary in question is statically linked and therefore so far as I know, it was unable to use the usual mechanism for writing out the profile on exit (via DT_FINI? I'm hazy on the details here). If I missed something and there is a way to make it work with static binaries then I can use that without needing `instrumentation-sleep-time`. /cc @aaupov and potential authors of watchProcess() @rafaelauler @yota9 @maksfb