infinisil / nixus

Experimental deployment tool supporting multi-host abstractions
GNU General Public License v3.0
195 stars 15 forks source link

Make the deploy script kill child processes upon exit #36

Open bjornfor opened 3 years ago

bjornfor commented 3 years ago

This allows aborting the deploy with Ctrl-C.

Fixes #5.

cole-h commented 3 years ago

Last time I attempted this, @Infinisil pointed out that this wouldn't kill remote deploy processes. So, you prevent the local deploy from finishing, but not the remote ones.

https://logs.nix.samueldr.com/nixus/2020-08-26#1598402208-1598402243

bjornfor commented 3 years ago

Ok, hehe, I wondered why nobody had done it if it was this easy.

infinisil commented 3 years ago

I guess this is better than nothing though. Most of the time when I need to Ctrl-C, it's during copying-closure time, which this would handle just fine. It's only a problem once it gets to https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/modules/deploy.nix#L106, which is the line that starts the remote process

bjornfor commented 3 years ago

I tested a bit more and found that ssh -tt actually fixes terminating remote processes. (I did this in a separate test script, not with nixus.) Does anyone know of a reason not to make nixus use ssh -tt?

Here is the test script:

#!/usr/bin/env bash
set -x

#trap "echo Received signal, exiting; exit" INT TERM
trap "exit" INT TERM
trap "kill 0" EXIT

# Remove the ssh -tt flags and see that when this script gets aborted, the "sleep
# 12m" command gets reparented to PID 1 on srv1 and lives on. With -tt it gets
# properly terminated.
ssh -tt srv1 sleep 12m &
sleep 12m &
wait
bjornfor commented 3 years ago

AFAICS, there is only one place where we'd need ssh -tt, thats where we switch to the new system:

diff --git a/modules/deploy.nix b/modules/deploy.nix
index 42f7f0f..5f1a9af 100644
--- a/modules/deploy.nix
+++ b/modules/deploy.nix
@@ -103,7 +103,9 @@ let
           privilegeEscalation = builtins.concatStringsSep " " config.privilegeEscalationCommand;
         in lib.dag.entryAnywhere ''
         echo "Triggering system switcher..." >&2
-        id=$(ssh -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")
+        # Use ssh -tt flags to also interrupt/stop the command on the remote
+        # side when the user presses Ctrl-C on the local side.
+        id=$(ssh -tt -o BatchMode=yes "$HOST" exec "${switch}/bin/switch" start "${system}")

         echo "Trying to confirm success..." >&2
         active=1

The other ssh calls are short-lived (like running mkdir or readlink).

infinisil commented 3 years ago

I can't find any docs for -tt, there's only -t? Also, Nixus' switch start command itself forks another process (a switch run) into the background, so I don't expect that to work anyways. See https://github.com/Infinisil/nixus/blob/2127516f68ce6f900b38289985377559e69aab88/scripts/switch#L61

bjornfor commented 3 years ago

I can't find any docs for -tt, there's only -t?

Right, in the docs for -t it says Multiple -t options force tty allocation, even if ssh has no local tty..

Nixus' switch start command itself forks another process (a switch run) into the background, so I don't expect that to work anyways.

Ok. I'm not familiar with that code. I guess at some point you don't want the script to be aborted? Some kind of "atomic section"?