Closed Zabrane closed 3 years ago
This is likely a bug that was introduced not too long ago. I think the reason this is happening is that it fails to obtain the status of the killed child so that the OS would release the child pid from being defunct.
Here's the way to illustrate this issue:
$ erl -boot start_sasl -pa _build/default/lib/erlexec/ebin
1> application:set_env(erlexec, debug, 2).
ok
2> application:start(erlexec).
ok
Initializing: uid=1000, userid=0
ok
3> exec:run("sleep 120", [monitor]).
Parsed cmd 'sleep 120' options
(stdin=null, stdout=null, stderr=null)
Redirecting [stdin -> null]
Redirecting [stdout -> null]
Redirecting [stderr -> null]
Starting child: 'sleep 120' (euid=2147483647)
child = (stdin=null(fd:3), stdout=null(fd:3), stderr=null(fd:3))
parent = (stdin=none, stdout=none, stderr=none)
Args[0]: /bin/bash
Args[1]: -c
Args[2]: sleep 120
Spawned child pid 211444
{ok,<0.1821.0>,211444}
Now in another terminal run:
$ while true; do ps -axo cmd | egrep '(s\leep|exec\-port)'; sleep 1; done
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
...
Now in the first window do:
4> application:stop(erlexec).
Setting alarm to 12 seconds
Got signal: 15
Called kill(pid=211444, sig=15) -> 0
Sent SIGTERM to pid 211444 (timeout=5s)
Child process 211444 exited
Called kill(pid=211444, sig=9) -> 0
Exiting (0)
=SUPERVISOR REPORT==== 20-Nov-2020::20:54:33.816055 ===
supervisor: {local,exec_app}
errorContext: shutdown_error
reason: killed
offender: [{pid,<0.1819.0>},
{id,exec},
{mfargs,{exec,start_link,[[{debug,2}]]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
=INFO REPORT==== 20-Nov-2020::20:54:33.819396 ===
application: erlexec
exited: stopped
type: temporary
ok
The supervisor report only gets called after exec-port
is terminated by SIGALARM
, while the 2nd window has:
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
[sleep] <defunct>
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
[sleep] <defunct>
...
So, obviously it is a bug related to child pid's cleanup. However I won't be able to look at it in the next couple of months as I am busy with another project. If you have time to review the C++ code, and submit a patch it would be very helpful.
Could you check this issue? I believe it's fixed in the last commit.
Hi @saleyn and Happy New Year
The fix seems to work as expected on Linux
. Thank you.
However, I wasn't able to compile and test this new release on my old MacOS
(High Sierra 10.13.6):
$ make
WARN: Missing plugins: [pc]
==> erlexec (compile)
Compiling c_src/exec.cpp
In file included from c_src/exec.cpp:58:
c_src/exec.hpp:409:5: error: expected member name or ';' after declaration specifiers
{
^
c_src/exec.hpp:407:22: error: expected '('
, poll_fd_idx{-1,-1,-1}
^
c_src/exec.hpp:407:23: warning: expression result unused [-Wunused-value]
, poll_fd_idx{-1,-1,-1}
^~
c_src/exec.hpp:407:26: warning: expression result unused [-Wunused-value]
, poll_fd_idx{-1,-1,-1}
^~
c_src/exec.hpp:407:31: error: expected ';' after expression
, poll_fd_idx{-1,-1,-1}
^
;
c_src/exec.hpp:407:29: warning: expression result unused [-Wunused-value]
, poll_fd_idx{-1,-1,-1}
^~
c_src/exec.cpp:222:24: error: expected '(' for function-style cast or type construction
fds[0] = pollfd{eis.read_handle(), POLLIN, 0}; // Erlang communication pipe
~~~~~~^
c_src/exec.cpp:223:24: error: expected '(' for function-style cast or type construction
fds[1] = pollfd{sigchld_pipe[0], POLLIN, 0}; // pipe for delivering SIGCHLD signals
~~~~~~^
c_src/exec.cpp:225:13: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
for(auto it=children.begin(), end=children.end(); it != end; ++it) {
^
c_src/exec.cpp:226:24: error: no member named 'include_stream_fd' in 'ei::CmdInfo'
it->second.include_stream_fd(fds);
~~~~~~~~~~ ^
c_src/exec.cpp:320:17: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
for(auto it=children.begin(), end=children.end(); it != end; ++it)
^
c_src/exec.cpp:322:28: error: no member named 'process_stream_data' in 'ei::CmdInfo'
it->second.process_stream_data(fds);
~~~~~~~~~~ ^
c_src/exec.cpp:340:5: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
auto rc = eis.read();
^
c_src/exec.cpp:708:29: error: expected '(' for function-style cast or type construction
fds[0] = pollfd{sigchld_pipe[0], POLLIN|POLLERR, 0};
~~~~~~^
c_src/exec.cpp:709:13: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
auto ts = deadline - timeout;
^
7 warnings and 8 errors generated.
ERROR: compile failed while processing /private/tmp/erlexec: rebar_abort
make: *** [all] Error 1
This is my config:
$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
$ erl
Erlang/OTP 23 [erts-11.1.5] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe] [lock-counting] [dtrace] [sharing-preserving]
Eshell V11.1.5 (abort with ^G)
Hmm, it seems that clang 10 uses c++98 by default. You should be able to compile it with:
$ CXXFLAGS="-std=c++11" make
Maybe -std=c++11
should be included in https://github.com/saleyn/erlexec/blob/master/rebar.config.script#L63 again.
@chrta better but erl_interface
is deprecated on 23
.
$ CXXFLAGS="-std=c++11" make
WARN: Missing plugins: [pc]
==> erlexec (compile)
Compiling c_src/exec.cpp
Compiling c_src/exec_impl.cpp
ld: library not found for -lerl_interface
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: sh(clang c_src/ei++.o c_src/exec.o c_src/exec_impl.o -m64 -L"/Users/zab/erlang/23.2.1/lib/erl_interface-4.0.1/lib" -lerl_interface -lei -o priv/x86_64-apple-darwin17.7.0/exec-port)
failed with return code 1 and the following output:
ld: library not found for -lerl_interface
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: compile failed while processing /private/tmp/erlexec: rebar_abort
make: *** [all] Error 1
I think it's good idea to add -std=c++11
to rebar.config.script
Added -std=c++11
in the latest commit.
@chrta better but erl_interface is deprecated on 23.
As of OTP 23, the registry functionality part of erl_interface has been deprecated
So fortunately only the registry part of erl_interface (which is not used in this project) is deprecated. erl_interface itself is not.
@saleyn what can make the
exec_app
to be killed when a node is stopped:The external processes are battle tested (NodeJS) and
shutdown cleanly
when they recieve aSIGTERM
signal:I noticed the same behaviour if i call: