ndmitchell / shake

Shake build system
http://shakebuild.com
Other
771 stars 118 forks source link

Processess spawned with 'cmd' not getting terminated when your Ctrl-C #169

Open dbp opened 10 years ago

dbp commented 10 years ago

So I have a phony task that starts a process. I expect that if I Ctrl-C, that the signal will be sent to the process, and then shake will continue / terminate (and this is how make works) . But, instead shake terminates (with 'user interrupt'), and the process is still running in the background.

I looked at the various options to run commands with, but didn't see anything obvious. Is the current behavior expected? It seems wrong...

dbp commented 10 years ago

Just as a followup, I've replaced calls to cmd Shell with system from System.Process and that does have the right semantics.

ndmitchell commented 10 years ago

I haven't thought about the issue much, so thanks for your insight. However, when running Shake in staunch mode (which is where the approaches differ most) I would expect ctrl-c to exit Shake, and Shake to clean up the processes (if possible) not just abort the one process. That seems not to match make? I need to do some experimentation.

dbp commented 10 years ago

I think it really depends on what the processes are intended to do. I've been using Shake to run tasks and start up (sometimes interactive) processes (something I've used make for in the past). In that context, when there is an interactive process (which is not really something that's currently defined), I think I would want input to go to it. In particular, if there was a sequence of commands, it's possible that I would want to do some cleanup in case a command failed, and I think aborting it manually counts as a failure case.

But when the processes aren't interactive, I think what you are describing makes a lot more sense. Because one really common annoyance with, for example, shell scripts that run various processes, is when you have to Ctrl-C repeatedly until you can get the signal to get delivered when the script is what's running, so it actually exits (or just look up it's PID and deliver the signal with kill).

And of course you could expect many processes to be running at once in the non-interactive case, at which point there is no sensible answer to who should recieve a signal (unless it's all of them).

So perhaps there should be some distinction between interactive commands and non? Or, maybe the documentation should just point out that using cmd for interactive stuff isn't really recommended, and just point to System.Process (which I haven't run into a problem with yet).

ndmitchell commented 10 years ago

You can certainly always call System.Process, but I'd like it to be possible with cmd too. I agree we probably want interactive vs non-interactive. I'll use this ticket to add Interactive as a setting, which gives the command stdin and responds to Ctrl-C as you describe. I suspect the code will be easy to write, I'm just not sure how to go about automatically testing it since it's all about what happens when you hit the keyboard.

dbp commented 10 years ago

Well, you can test at the very least that cmd Interactive still runs. But yeah, seems hard to test that the special behavior is working. At least, though, it will be very obvious to users of Shake if it stops working (so it shouldn't lead to subtle bugs - famous last words).

ndmitchell commented 10 years ago

One of the reasons I'm really keen on having automated tests is that I develop on Windows, and my continuous integration tests run on Linux, so if I have tests I can be sure it works consistently. It's not fatal if I can't do that, but it is annoying.

Testing it out, I found that stdin in both is inherited from the process, so it seems cmd already works just fine with stdin, and any changes just concern Ctrl-C.

For Ctrl-C handling the docs (and my experiments) suggest that even using System.Process you still get a user exception, and the thread still dies. If you want to clean up after a process then using actionFinally and actionOnException might be sufficient. Can you give me a small example where process works but cmd fails? Are you catching exceptions yourself?

ndmitchell commented 5 years ago

See also https://github.com/snowleopard/hadrian/issues/696#issuecomment-459562639 which describes issues around this problem

AndreasPK commented 5 years ago

So I tried this using the following code:

import Development.Shake
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Util
import Control.Concurrent
import System.Process

main :: IO ()
main = shakeArgs shakeOptions{shakeFiles="_build"} $ do
    want ["_build/run" <.> exe]

    "_build/run" <.> exe %> \out -> do
        -- cmd_ "Sleep.exe" "" :: Action ()
        liftIO $ system "Sleep.exe"
        liftIO $ threadDelay $ 1000*1000*100

And at least for me under mintty the behaviour of cmd, system and process was the same.

ndmitchell commented 5 years ago

@AndreasPK and what was that behaviour? Did it work and clean up properly?

ndmitchell commented 5 years ago

FWIW I've pushed a lot of patches to this part of the code since the issue was first raised 5 years ago.

AndreasPK commented 5 years ago

@AndreasPK and what was that behaviour? Did it work and clean up properly?

No, it fails to terminate the subprocess and stays active in the background (when run in in the msys shell).

hasufell commented 4 years ago

This is still a problem. We have a jenkins that runs shake, which runs docker. If we cancel a job, docker will keep running, because SIGTERM is not propagated.

ndmitchell commented 4 years ago

@hasufell - is this pure Linux, or involving msys/mingw or similar?

hasufell commented 4 years ago

@ndmitchell linux

ndmitchell commented 4 years ago

I note that Docker seems to behave funny in #731 too, CC @symbiont-arnaud-bailly. I've no real idea what is happening here (and completely out of my depth). Maybe the way Docker and System.Process interact is different to normal?

hasufell commented 4 years ago

I don't think this has anything to do with docker. Sending SIGTERM to our docker containers manually works fine. We are not using actionFinally. If you look at the behavior of make, then it sends a SIGTERM to the subprocesses it has spawned. Shake doesn't.

#!/bin/sh

trap 'echo kill && kill -KILL $child 2>/dev/null' TERM
sleep 500000 &
child=$!
wait $child
all:
    ./run.sh
#!/usr/bin/env -S cabal new-run
{- cabal:
build-depends: base
            , shake
-}

module Main (main) where

import Development.Shake
import Development.Shake.Classes
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Rule
import Prelude hiding ((*>))

main :: IO ()
main = do
  shakeArgs shakeOptions { shakeFiles = "_build/"
                         , shakeVerbosity = Loud
                         , shakeThreads = 0
                         , shakeVersion = "4"} $ do

    phony "foo" $ do
      cmd_ "./run.sh"
      pure ()

sending SIGTERM to the shake process will not propagate to the script.

ndmitchell commented 4 years ago

The actionFinally ticket is probably not anything to do with actionFinally, but it has Docker as the example command that fails to terminate, making it likely to be the same as your example.

In Shake, when an exception happens, I call interruptProcessGroupOf which is documented as sending SIGINT. Then three seconds after that returns I call terminateProcess which sends SIGKILL. Is that sufficient to kill Docker? Does the SIGINT mess things up? Does the function abort in Shake get called?

(I'm on Windows not Linux, so happy to take any advice on the right thing to do here.)

hasufell commented 4 years ago

Then three seconds after that returns I call terminateProcess which sends SIGKILL.

Afais it sends SIGTERM.

Is that sufficient to kill Docker?

Yes, as described in the minimal example, this has nothing to do with docker.

Does the SIGINT mess things up?

SIGINT propagates properly when I press ctrl-c and abort gets called.

Does the function abort in Shake get called?

When you send SIGTERM to the shake process: no. But this is the expected behavior, which is convention in make too.

hasufell commented 4 years ago

Afais, the problem is

https://github.com/ndmitchell/shake/blob/c3f08f906f41ec62e813a0898897f7f58d804e17/src/General/Process.hs#L159-L160

https://github.com/ndmitchell/shake/blob/c3f08f906f41ec62e813a0898897f7f58d804e17/src/General/Process.hs#L108-L116

This is not signal handling code at all. This is just catching asynchronous exceptions. SIGINT seems to be one, but SIGTERM is not.

Also see:

ndmitchell commented 4 years ago

Should I be installing a signal handler that turns SIGTERM into an exception so everything terminates properly? Should all Haskell processes be doing that?

hasufell commented 4 years ago

To me it seems async exceptions are messy. Taken from the gnu manual

Macro: int SIGTERM The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.

The shell command kill generates SIGTERM by default.

So, in my opinion, a program that receives SIGTERM and decides it is OK to terminate, should also clean up after itself. That is... send SIGTERM to sub-processes, wait for a grace period, then send SIGKILL.

I don't think the question can be answered for every process, though.

ndmitchell commented 4 years ago

It seems that if your advice is right (and I have no reason to suspect its not!), then it should be the default of every Haskell program to turn SIGTERM into an async exception and let that take out the program. Why isn't that done for everyone by default? I don't think anything makes Shake special in this regard. Just trying to understand the context before doing something different for Shake.

hasufell commented 4 years ago

I think the problem here is that as the gnu manual says: SIGTERM may be ignored!

If we turn it into an async exception and people don't catch it... oops. I think this isn't meant to be an exception. So in my opinion it's really up to the program implementor on whether and how to define it. For everything else, there is SIGKILL.

symbiont-arnaud-bailly commented 4 years ago

FWIW, I have written some tool last year that spawned processes and managed them, and I baked in signal handling (SIGINT and SIGTERM essentially) because that's the expected behaviour in Unix-land: When a user hits Ctrl-C it expects the foreground process to terminate and cleanup properly.

hasufell commented 4 years ago

When a user hits Ctrl-C it expects the foreground process to terminate and cleanup properly.

That's true. When you run the following C program without explicit signal handling and hit Ctrl-C, then it terminates the child processes as well:

#include <unistd.h>

int main() {
  int mypid = fork();

  if (0 == mypid) {
    sleep(500000);
  } else {
    while (1) {
      sleep(2);
    }
  }

  return (0);
}

However, if you send SIGTERM to the parent, the child will keep running. This is why I think the default behavior for haskell programs should be left unchanged, but SIGTERM be implemented specifically for shake to match make behavior.

symbiont-arnaud-bailly commented 4 years ago

When you run the following C program without explicit signal handling and hit Ctrl-C, then it terminates the child processes as well:

That's true but it's because the shell itself handles SIGINT properly, propagating the signal to the foreground process group. If you properly detach the subprocess, creating its own subgroup, then it won't receive SIGINT.

If you manually send SIGTERM using a process's PID, then the signal is not by default propagated to children which explains why the child process does not terminate in this case.

IMHO, shake should handle SIGINT and SIGTERM in such a way that all its subprocesses receive the signal, using standard POSIX signal handling which, obviously, only make sense on an Unix system. And it should probably use the Windows specific mechanism to provide this behaviour on Win32 platform.

hasufell commented 4 years ago

That's true but it's because the shell itself handles SIGINT properly

Nice catch. Now I'm starting to wonder what happens if that shell behavior might not in fact be problematic if the process has its own implementation of reacting to SIGINT?

symbiont-arnaud-bailly commented 4 years ago

If a subprocess of a shell handles SIGINT, I guess it can do whatever it please, including refusing to quit :) That's why there is SIGKILL and SIGSTOP which cannot have handlers attached.

hasufell commented 4 years ago

I meant that the shell will apparently send SIGINT to the sub processes without letting the parent process decide what to do?

symbiont-arnaud-bailly commented 4 years ago

If they are part of the same process group, which is the case by default, yes. But then the parent could handle SIGCHLD :)

symbiont-arnaud-bailly commented 4 years ago

The semantics of signals is fascinating, and the number of potential pitfalls staggering!

hasufell commented 4 years ago

@ndmitchell do you have a clearer picture about the problem now? Do you need further information?

ndmitchell commented 4 years ago

Thanks for all your discussion on this topic, but my current state is I'm totally confused. There was a long discussion about signals at #748 and StackOverflow info at https://stackoverflow.com/questions/61856063/spawning-a-process-with-create-group-true-set-pgid-hangs-when-starting-docke/61881005#61881005. It would be great to get a 5-year-old style answer to:

mpickering commented 3 years ago

I don't think this has anything to do with docker. Sending SIGTERM to our docker containers manually works fine. We are not using actionFinally. If you look at the behavior of make, then it sends a SIGTERM to the subprocesses it has spawned. Shake doesn't.

#!/bin/sh

trap 'echo kill && kill -KILL $child 2>/dev/null' TERM
sleep 500000 &
child=$!
wait $child
all:
  ./run.sh
#!/usr/bin/env -S cabal new-run
{- cabal:
build-depends: base
            , shake
-}

module Main (main) where

import Development.Shake
import Development.Shake.Classes
import Development.Shake.Command
import Development.Shake.FilePath
import Development.Shake.Rule
import Prelude hiding ((*>))

main :: IO ()
main = do
  shakeArgs shakeOptions { shakeFiles = "_build/"
                         , shakeVerbosity = Loud
                         , shakeThreads = 0
                         , shakeVersion = "4"} $ do

    phony "foo" $ do
      cmd_ "./run.sh"
      pure ()

sending SIGTERM to the shake process will not propagate to the script.

  1. This reply is about SIGTERM, not SIGINT. Ctrl-C sends SIGINT to the process and so SIGTERM is irrelevant. For SIGTERM handling you need to install a custom handler as there's no special handling in the RTS as there is for SIGINT.

  2. If you modify your program to catch SIGINT then shake cleans up correctly.

So I'm not sure there is even a problem with SIGINT.