saleyn / erlexec

Execute and control OS processes from Erlang/OTP
https://hexdocs.pm/erlexec/readme.html
Other
525 stars 139 forks source link

Application exec_app killed #139

Closed Zabrane closed 3 years ago

Zabrane commented 3 years ago

@saleyn what can make the exec_app to be killed when a node is stopped:

> init:stop().
=SUPERVISOR REPORT==== 20-Nov-2020::14:07:15.054924 ===
    supervisor: {local,exec_app}
    errorContext: shutdown_error
    reason: killed
    offender: [{pid,<0.147.0>},
               {id,exec},
               {mfargs,{exec,start_link,[[]]}},
               {restart_type,permanent},
               {shutdown,10000},
               {child_type,worker}]

The external processes are battle tested (NodeJS) and shutdown cleanly when they recieve a SIGTERM signal:

{ok, Pid, _} = exec:run(NodeJsExec, Opts),

I noticed the same behaviour if i call:

> application:stop(erlexec).
%% huge crash
saleyn commented 3 years ago

This is likely a bug that was introduced not too long ago. I think the reason this is happening is that it fails to obtain the status of the killed child so that the OS would release the child pid from being defunct.

Here's the way to illustrate this issue:

$ erl -boot start_sasl -pa _build/default/lib/erlexec/ebin
1> application:set_env(erlexec, debug, 2).
ok
2> application:start(erlexec).
ok
Initializing: uid=1000, userid=0
ok
3> exec:run("sleep 120", [monitor]).
Parsed cmd 'sleep 120' options
  (stdin=null, stdout=null, stderr=null)
  Redirecting [stdin -> null]
  Redirecting [stdout -> null]
  Redirecting [stderr -> null]
Starting child: 'sleep 120' (euid=2147483647)
  child  = (stdin=null(fd:3), stdout=null(fd:3), stderr=null(fd:3))
  parent = (stdin=none, stdout=none, stderr=none)
  Args[0]: /bin/bash
  Args[1]: -c
  Args[2]: sleep 120
Spawned child pid 211444
{ok,<0.1821.0>,211444}

Now in another terminal run:

$ while true; do ps -axo cmd | egrep '(s\leep|exec\-port)'; sleep 1; done
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
...

Now in the first window do:

4> application:stop(erlexec).       
Setting alarm to 12 seconds
Got signal: 15
Called kill(pid=211444, sig=15) -> 0
Sent SIGTERM to pid 211444 (timeout=5s)
Child process 211444 exited
Called kill(pid=211444, sig=9) -> 0
Exiting (0)
=SUPERVISOR REPORT==== 20-Nov-2020::20:54:33.816055 ===
    supervisor: {local,exec_app}
    errorContext: shutdown_error
    reason: killed
    offender: [{pid,<0.1819.0>},
               {id,exec},
               {mfargs,{exec,start_link,[[{debug,2}]]}},
               {restart_type,permanent},
               {shutdown,10000},
               {child_type,worker}]

=INFO REPORT==== 20-Nov-2020::20:54:33.819396 ===
    application: erlexec
    exited: stopped
    type: temporary

ok

The supervisor report only gets called after exec-port is terminated by SIGALARM, while the 2nd window has:

_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
sleep 120
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
[sleep] <defunct>
_build/default/lib/erlexec/priv/x86_64-unknown-linux-gnu/exec-port -debug 2
[sleep] <defunct>
...

So, obviously it is a bug related to child pid's cleanup. However I won't be able to look at it in the next couple of months as I am busy with another project. If you have time to review the C++ code, and submit a patch it would be very helpful.

saleyn commented 3 years ago

Could you check this issue? I believe it's fixed in the last commit.

Zabrane commented 3 years ago

Hi @saleyn and Happy New Year

The fix seems to work as expected on Linux. Thank you.

However, I wasn't able to compile and test this new release on my old MacOS (High Sierra 10.13.6):

$ make
WARN:  Missing plugins: [pc]
==> erlexec (compile)
Compiling c_src/exec.cpp
In file included from c_src/exec.cpp:58:
c_src/exec.hpp:409:5: error: expected member name or ';' after declaration specifiers
    {
    ^
c_src/exec.hpp:407:22: error: expected '('
        , poll_fd_idx{-1,-1,-1}
                     ^
c_src/exec.hpp:407:23: warning: expression result unused [-Wunused-value]
        , poll_fd_idx{-1,-1,-1}
                      ^~
c_src/exec.hpp:407:26: warning: expression result unused [-Wunused-value]
        , poll_fd_idx{-1,-1,-1}
                         ^~
c_src/exec.hpp:407:31: error: expected ';' after expression
        , poll_fd_idx{-1,-1,-1}
                              ^
                              ;
c_src/exec.hpp:407:29: warning: expression result unused [-Wunused-value]
        , poll_fd_idx{-1,-1,-1}
                            ^~
c_src/exec.cpp:222:24: error: expected '(' for function-style cast or type construction
        fds[0] = pollfd{eis.read_handle(), POLLIN, 0};  // Erlang communication pipe
                 ~~~~~~^
c_src/exec.cpp:223:24: error: expected '(' for function-style cast or type construction
        fds[1] = pollfd{sigchld_pipe[0],   POLLIN, 0};  // pipe for delivering SIGCHLD signals
                 ~~~~~~^
c_src/exec.cpp:225:13: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
        for(auto it=children.begin(), end=children.end(); it != end; ++it) {
            ^
c_src/exec.cpp:226:24: error: no member named 'include_stream_fd' in 'ei::CmdInfo'
            it->second.include_stream_fd(fds);
            ~~~~~~~~~~ ^
c_src/exec.cpp:320:17: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
            for(auto it=children.begin(), end=children.end(); it != end; ++it)
                ^
c_src/exec.cpp:322:28: error: no member named 'process_stream_data' in 'ei::CmdInfo'
                it->second.process_stream_data(fds);
                ~~~~~~~~~~ ^
c_src/exec.cpp:340:5: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
    auto rc = eis.read();
    ^
c_src/exec.cpp:708:29: error: expected '(' for function-style cast or type construction
            fds[0]  = pollfd{sigchld_pipe[0], POLLIN|POLLERR, 0};
                      ~~~~~~^
c_src/exec.cpp:709:13: warning: 'auto' type specifier is a C++11 extension [-Wc++11-extensions]
            auto ts = deadline - timeout;
            ^
7 warnings and 8 errors generated.
ERROR: compile failed while processing /private/tmp/erlexec: rebar_abort
make: *** [all] Error 1

This is my config:

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ erl
Erlang/OTP 23 [erts-11.1.5] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe] [lock-counting] [dtrace] [sharing-preserving]

Eshell V11.1.5  (abort with ^G)
chrta commented 3 years ago

Hmm, it seems that clang 10 uses c++98 by default. You should be able to compile it with:

$ CXXFLAGS="-std=c++11" make

Maybe -std=c++11 should be included in https://github.com/saleyn/erlexec/blob/master/rebar.config.script#L63 again.

Zabrane commented 3 years ago

@chrta better but erl_interface is deprecated on 23.

$ CXXFLAGS="-std=c++11" make
WARN:  Missing plugins: [pc]
==> erlexec (compile)
Compiling c_src/exec.cpp
Compiling c_src/exec_impl.cpp
ld: library not found for -lerl_interface
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: sh(clang c_src/ei++.o c_src/exec.o c_src/exec_impl.o  -m64  -L"/Users/zab/erlang/23.2.1/lib/erl_interface-4.0.1/lib" -lerl_interface -lei -o priv/x86_64-apple-darwin17.7.0/exec-port)
failed with return code 1 and the following output:
ld: library not found for -lerl_interface
clang: error: linker command failed with exit code 1 (use -v to see invocation)

ERROR: compile failed while processing /private/tmp/erlexec: rebar_abort
make: *** [all] Error 1

I think it's good idea to add -std=c++11 to rebar.config.script

saleyn commented 3 years ago

Added -std=c++11 in the latest commit.

chrta commented 3 years ago

@chrta better but erl_interface is deprecated on 23.

As of OTP 23, the registry functionality part of erl_interface has been deprecated

So fortunately only the registry part of erl_interface (which is not used in this project) is deprecated. erl_interface itself is not.