Closed sasa1977 closed 3 years ago
The design of erlexec
was implying that if the beam
is killed, then the exec-port
process would detect the pipe closing, and would kill all of its jobs. The last time I tested this on Linux a few years back it worked without issues. If it doesn't work with docker for some reason you'd need to troubleshoot (setting the debug and verbose modes, and trying to see where the issue is). I don't think it's related to the custom kill command.
The design of
erlexec
was implying that if thebeam
is killed, then theexec-port
process would detect the pipe closing, and would kill all of its jobs.
This definitely works. I relied on it in the past, and I also double checked that it still works today.
I don't think it's related to the custom kill command.
My impression is that custom kill command is not executed if beam is killed. Here's a simple demo in Elixir:
{:ok, pid, _ospid} = :exec.run(
"""
function cleanup {
echo "got exit signal" >> debug.txt
}
trap cleanup 0
sleep infinity
""",
kill: ~s/echo "custom kill command" >> debug.txt/
)
If I invoke this in iex
shell, and then invoke :exec.stop(pid)
, debug.txt will contain custom kill command
. However, if I start the program and kill the shell by hitting ctrl+c twice, then debug.txt will contain only got exit signal
(make sure to remove the file before restarting the experiment).
This indicates that erlexec
ignores the custom kill command when cleaning up after beam is killed, and instead always sends an exit signal.
A custom kill command only gets executed when a child process is not getting terminated by SIGTERM. Only in that case exec-port
will execute the custom kill command. So in your example above the log contains got exit signal
, which is indicative that the child process got killed and there was no need to execute the custom kill command.
I see. My question is then how can I ensure that custom kill command is immediately executed in the following situations:
stop
(from what I can tell this already works today)I tried playing with kill_timeout
, but that doesn't seem to do the trick. I basically need some way of instructing erlexec to immediately move on to the custom command without even trying with SIGTERM. Or perhaps, alternatively, I'd like to provide a custom terminate (not kill) command which would be used instead of SIGTERM. Is any of that possible today?
In the current implementation the kill command is the fallback for the failed SIGTERM. It seems to me that you can solve your problem simply by writing a wrapper script for your child task, which traps SIGTERM, and executes the custom command you need.
You are also welcome to submit a patch that adds an option to force the kill command without SIGTERM.
It seems to me that you can solve your problem simply by writing a wrapper script for your child task, which traps SIGTERM, and executes the custom command you need.
Yeah this is the approach I'm currently using, although it feels slightly hacky.
I'll see if I can submit a patch, but I need to dust off my C++ first, so I can't commit to any deadline :-)
I believe that the latest commit fixed this issue.
Cool! I forgot about this issue, b/c I went down the different path and rolled my own small port command wrapper. So regardless of whether this is solved or not, feel free to close this issue, b/c I personally won't find the time to work on it.
I'm trying to start a docker container via erlexec, and I'd like to ensure proper cleanup of the container. To make this work, I'm playing with the following approach:
Here's an elixir example:
If I stop such process with
:exec.stop/1
the container is correctly removed. However, if BEAM is terminated forcefully (e.g. by hitting ctrl-c twice), the container remains alive. This leads me to the conclusion that custom kill command is ignored when erlexec performs post-beam cleanup. Would this assumption be correct?