[turborepo] regression on handling SIGINT

zanona commented 1 year ago

What version of Turborepo are you using?

1.7.4

What package manager are you using / does the bug impact?

npm

What operating system are you using?

Linux

Describe the Bug

Referring to #444 and its implementation on #663, it looks like there was a behavioural regression introduced in 1.7.0 where SIGINT is no longer handled properly. The last working version was 1.6.3.

The bug is related to the fact that when interrupting a program (^C), turbo doesn't wait for the program to handle the SIGINT command and immediately terminates the process.

It is worth noting that the SIGINT instruction continues to run in the background, and the terminal becomes interactive again. This poses a problem when running turbo from a Docker container, for example, while Docker waits for the SIGINT instruction to be handled and then kills the container process. Since turbo doesn't wait for SIGINT to complete, docker simply kills the process prematurely. The expected behaviour here would be that of 1.6.3.

Expected Behavior

Turbo should wait for the program to handle the SIGINT instruction and only then exit the running process.

To Reproduce

./apps/bar/package.json

{
   "name": "@foo/bar",
   "scripts": {
       "start": "trap 'echo wait…; sleep 3; echo done!; exit 0' SIGINT; sleep 10",
   }
}

turbo@1.6.3 (expected behaviour)


$ npx turbo run start --filter=@foo/bar --only
• Packages in scope: @foo/bar
• Running start in 1 packages
• Remote caching disabled
@foo/bar:start: cache miss, executing a0c8f8ecc5ad4cad
@foo/bar:start:
@foo/bar:start: > start
@foo/bar:start: > trap 'echo wait…; sleep 3; echo done!; exit 0' SIGINT; sleep 10
@foo/bar:start:
^C@foo/bar:start: wait…  # <<< SIGINT
@foo/bar:start: done!
$ echo interactive again....

turbo@1.7.0 (unexpected behaviour)

$ npx turbo run start --filter=@foo/bar --only
• Packages in scope: @foo/bar
• Running start in 1 packages
• Remote caching disabled
@foo/bar:start: cache miss, executing 52f6ea6626d6482f
@foo/bar:start:
@foo/bar:start: > start
@foo/bar:start: > trap 'echo wait…; sleep 3; echo done!; exit 0' SIGINT; sleep 10
@foo/bar:start:
^C@foo/bar:start: wait… # <<< SIGINT
$ echo interactive again
interactive again
$ @foo/bar:start: done! # <<< bg process

Reproduction Repo

No response

notaphplover commented 1 year ago

I am actually facing the same issue with version 1.8.3. @mehulkar, I just uploaded a reproduction repo in case it's helpful.

gajus commented 1 year ago

Tagging @jaredpalmer for visibility

mehulkar commented 1 year ago

May be fixed by https://github.com/vercel/turbo/pull/4276, could you verify against v1.8.5?

notaphplover commented 1 year ago

Hey @mehulkar, I might be wrong, but I would say the issue persist. I updated the reproduction repo to use turbo@1.8.5, but the issue persist

mehulkar commented 1 year ago

Thanks for trying! cc @arlyon @chris-olszewski

notaphplover commented 1 year ago

@chris-olszewski I just saw a new turbo@1.9.0 version was released. I tried in the reproduction repo with that version and the issue persist:

Could you please reopen the issue?

notaphplover commented 1 year ago

Update: Same with turbo@1.9.1:

chris-olszewski commented 1 year ago

@notaphplover, can you confirm the expected outcomes of sending a SIGINT to pnpm run foo:pnpm? On my machine it exits with a 0 same as pnpm run foo:pnpm:trap. I've also checked and it looks like turborepo has had this behavior (exit with code 1 regardless of trap) since 1.5.0, can you confirm this on your machine?

Also, if you need an exit code 0, could you change the trap command to trap 'exit 0' INT TERM; turbo run foo and that seems to work on my machine.

notaphplover commented 1 year ago

Hi @chris-olszewski :smiley:.

@notaphplover, can you confirm the expected outcomes of sending a SIGINT to pnpm run foo:pnpm? On my machine it exits with a 0 same as pnpm run foo:pnpm:trap. I've also checked and it looks like turborepo has had this behavior (exit with code 1 regardless of trap) since 1.5.0, can you confirm this on your machine?

On my machine pnpm run foo:pnpm exits with non zero code and pnpm run foo:pnpm:trap with zero code. If this is a problem I think I could make an effort and set a gh action reproducing the issue. I could even open a debugging port and allow you to connect through ssh with some magic tricks, but I was expecting any linux machine would behave similar in this case :(. Unlucky us I guess.

Regarding which version introduced this (I think) unexpected behavior, I tried with 1.5.6 and it had the error. It's being a little bit hard to test in other 1.5 and 1.4 versions, the daemon does not manage to start (probably related to #2034). I managed to recreate the right behavior on 1.4.6 so I would hazard to say you are right.

Also, if you need an exit code 0, could you change the trap command to trap 'exit 0' INT TERM; turbo run foo and that seems to work on my machine.

Yeah, I know, the thing is, that's not what I want. I want to exit with code zero if and only if the process is able to exit gracefully with no issues. For that reason I want to trap '' instead, propagating the exit code.

Hope all of this helps. I would prefer not to pass through the pain of setting the remote debugging session, but if you really need it I can go for it in a couple of days.

Edit: I just realized the debugging session wouldn't be of any help :sweat_smile:, but the ssh connection would allow you to connect to the gh runner and recreate the issue. Probably an overkill since docker seems a much simpler way to go

chris-olszewski commented 1 year ago

If this is a problem I think I could make an effort and set a gh action reproducing the issue

No need to, I just wanted to make sure that the description in the repro was correct and my machine was doing something weird.

I managed to recreate the right behavior on 1.4.6 so I would hazard to say you are right.

Thanks for confirming, this narrows the code changes to check quite a lot.

I want to exit with code zero if and only if the process is able to exit gracefully with no issues.

Understand, just wanted to check if that would provide any intermediate relief.

Sorry again for the drop in communication.

notaphplover commented 1 year ago

Sorry again for the drop in communication.

All good. This is an open source project after all. Love the beautiful work you're doing. Sometimes these issues happen, I simply opened the other issue to avoid losing the tracking.

chris-olszewski commented 1 year ago

@notaphplover I had some time to delve into this and this is a larger feature request. We currently always exit with exit code 1 if we receive a signal. In order to return the highest exit code we need to start gracefully handling signals where the first SIGINT we receive gets forwarded to the child processes and then if we receive another SIGINT we sent a SIGKILL.

I don't expect this work will get done until we finish porting the codebase to Rust. Hopefully as we port this signal code we can set the groundwork for being more graceful with our signal handling.

EloB commented 2 months ago

Is this working or will be fixed anytime soon? I'm having some issues that docker doesn't close when I do Ctrl + C. I'm using pnpm and graceful teardown works using pnpm --filter=mypackage run dev but npx turbo run dev --filter=mypackage fails to teardown all child processes.

giorgiogross commented 1 month ago

Experiencing the same issue, turbo version 1.13.3; when I hit ctrl+c I get an ERROR run failed: command exited (1). That doesn't happen when I run a command with a single task from root directory and use interactive mode, but when I disable interactive or when there is more than one task I get that error. Oh, and I'm using npm! :)

vercel / turbo