marxin / cvise

Super-parallel Python port of the C-Reduce
Other
219 stars 25 forks source link

cvise doesn't reliably kill things upon timeout. #145

Closed vext01 closed 1 week ago

vext01 commented 2 months ago

Hi,

I'm reducing the lua interpreter using cvise. The interestingness oracle compiles and runs the interpreter.

I noticed that cvise would frequently get stuck, and an inspection of the process table shows that many lua processes were running well beyond the allocated timeout. It appears that you need to be more forceful than SIGTERM to reliably kill lua. SIGKILL seems to be sufficient (but see the caveat below).

So I wonder if it would be nice to add the ability to specify the signal that is sent to the process upon timeout (or maybe you could even allow an arbitrary shell command, in case you want to do something that isn't sending a signal?).

For now I've hacked the cvise source code to have it send a sigkill:

diff --git a/cvise/utils/testing.py b/cvise/utils/testing.py
index 2300c3c..15da7fa 100644
--- a/cvise/utils/testing.py
+++ b/cvise/utils/testing.py
@@ -362,7 +362,7 @@ class TestManager:
                 children.append(process)
                 for child in children:
                     try:
-                        child.terminate()
+                        child.kill()
                     except psutil.NoSuchProcess:
                         pass
             except psutil.NoSuchProcess:

A colleague mentioned that he had this same issue and wrapped the oracle in timeout(1) in order to SIGKILL the interpreter.

But here's a hitch, with the above patch applied, cvise gets stuck much, much less frequently, but on occasion still does:

image

For example, above the timeout is 60 seconds, yet a lua process has been running for over an hour. When this happens, so far I've just been manually kill -9ing the PID to allow cvise to continue. The processes have realiably died when I've manually killed them in this way. SIGKILL is unmaskable, so that's what I'd expect.

But I wonder how this can happen. Could there be a bug in cvise's killing logic?

I'm using cvise master branch (commit 16a34b273966e32843b3f193590198e2a769b1b0) with the above patch on Debian/amd64.

marxin commented 2 months ago

Hello!

Good to hear from somebody using the tool to reduce a code in Lua!

Yeah, I must confess I've also encountered similar issues in the part. The most reliable approach for me was something like:

cvise -c "timeout 5 gcc -O2 x.c 2>&1 | grep "internal compiler error"

(one can potentially specify a signal for the timeout command). Can you please give it a try if it's a more reliable approach?

Sure, I would be happy to accept such a PR that will add the possibility to specify the signal, but I'm a bit sad one can still face the unpleasant situation where the process is blocking C-Vise :/

vext01 commented 2 weeks ago

timeout is a reasonable workaround. I wonder if we should document this somewhere.

marxin commented 1 week ago

I wonder if we should document this somewhere.

Yep. I just changed the default to Process::kill and any documentation improvement is welcomed.