saleyn / erlexec

Execute and control OS processes from Erlang/OTP
https://hexdocs.pm/erlexec/readme.html
Other
525 stars 139 forks source link

Any command terminates with exit status 132 #176

Closed Zabrane closed 4 months ago

Zabrane commented 4 months ago

Hi @saleyn

We are using erlexec from this commit 7b58687a6fc7b750a49c0a64d9753c97512570d1. Our Erlang release runs perfectly inside Docker. Today, we deployed a new version of our app and start getting erlexec errors. Any command we try to run ends up with the same exit status 132.

Here is an example from with Docker:

$ /opt/arben/bin/arben remote_console
> f(I), f(P), {ok, P, I} = exec:run("echo ok", [{stdout, self()}, monitor]).
** exception exit: {{exit_status,132},
                    {gen_server,call,
                                [exec,
                                 {port,{{run,"echo ok",[{stdout,<0.2244.0>},monitor]},
                                        monitor,false}},
                                 30000]}}
     in function  gen_server:call/3 (gen_server.erl, line 419)
     in call from exec:do_run/3 (/dev/arben/_build/default/lib/erlexec/src/exec.erl, line 970)
  1. What make erlexec terminates with exit code 132?
  2. Would you recommend using erlexec from master ?

This only start happening today. Before deploying our new release, erlexec was working as expected.

Help appreciated

Zabrane commented 4 months ago

@saleyn same error when using the latest erlexec from master.

Exit code 132=128 + 4.

SIGILL=4 signal is about Illegal instruction: the program contained some machine code the CPU can't understand.

If I try to run the failing erlexec command using os:cmd, they all go perfectly without any error.

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:    22.04
Codename:   jammy

#  dmesg -T | grep opcode
[Tue Apr  9 13:03:44 2024] traps: exec-port[19214] trap invalid opcode ip:5595beec6446 sp:7ffc773eb600 error:0 in exec-port[5595beebf000+22000]
[Tue Apr  9 13:03:45 2024] traps: exec-port[19215] trap invalid opcode ip:56380fb13446 sp:7ffda1300c60 error:0 in exec-port[56380fb0c000+22000]
[Tue Apr  9 13:03:45 2024] traps: exec-port[19250] trap invalid opcode ip:55c5862ed446 sp:7ffed779f4c0 error:0 in exec-port[55c5862e6000+22000]
[Tue Apr  9 13:03:45 2024] traps: exec-port[19251] trap invalid opcode ip:5575d1129446 sp:7ffca453a680 error:0 in exec-port[5575d1122000+22000]
[Tue Apr  9 13:03:46 2024] traps: exec-port[19257] trap invalid opcode ip:5618a1e6c446 sp:7fff001fa900 error:0 in exec-port[5618a1e65000+22000]
[Tue Apr  9 13:03:46 2024] traps: exec-port[19262] trap invalid opcode ip:55cf26e09446 sp:7fffcf58da40 error:0 in exec-port[55cf26e02000+22000]
[Tue Apr  9 13:05:21 2024] traps: exec-port[20144] trap invalid opcode ip:564cd448a446 sp:7ffc7bb1df80 error:0 in exec-port[564cd4483000+22000]
[Tue Apr  9 13:05:21 2024] traps: exec-port[20163] trap invalid opcode ip:559a8398f446 sp:7ffca74f5be0 error:0 in exec-port[559a83988000+22000]
[Tue Apr  9 13:12:19 2024] traps: exec-port[20165] trap invalid opcode ip:564f0228f446 sp:7ffd9dc36c60 error:0 in exec-port[564f02288000+22000]
[Tue Apr  9 13:12:29 2024] traps: exec-port[20439] trap invalid opcode ip:5621ecce9446 sp:7fff27454e20 error:0 in exec-port[5621ecce2000+22000]
[Tue Apr  9 13:12:37 2024] traps: exec-port[20441] trap invalid opcode ip:55b5c22a9446 sp:7ffd4be08900 error:0 in exec-port[55b5c22a2000+22000]
[Tue Apr  9 13:12:42 2024] traps: exec-port[20443] trap invalid opcode ip:5556451be446 sp:7ffce9cfbb00 error:0 in exec-port[5556451b7000+22000]
[Tue Apr  9 13:37:33 2024] traps: exec-port[21109] trap invalid opcode ip:5571ac457446 sp:7fffc6569880 error:0 in exec-port[5571ac450000+22000]
[Tue Apr  9 13:37:33 2024] traps: exec-port[21110] trap invalid opcode ip:561a15fa6446 sp:7ffc2f1e4e40 error:0 in exec-port[561a15f9f000+22000]
[Tue Apr  9 13:37:33 2024] traps: exec-port[21129] trap invalid opcode ip:55d409242446 sp:7ffcffdf2ee0 error:0 in exec-port[55d40923b000+22000]
[Tue Apr  9 13:37:34 2024] traps: exec-port[21149] trap invalid opcode ip:560a5a16b446 sp:7ffc108eb9c0 error:0 in exec-port[560a5a164000+22000]
[Tue Apr  9 13:37:34 2024] traps: exec-port[21150] trap invalid opcode ip:55675b655446 sp:7fff89c52280 error:0 in exec-port[55675b64e000+22000]
[Tue Apr  9 13:37:34 2024] traps: exec-port[21155] trap invalid opcode ip:5636961ed446 sp:7ffc1632bc20 error:0 in exec-port[5636961e6000+22000]
[Tue Apr  9 13:38:39 2024] traps: exec-port[21589] trap invalid opcode ip:565462efd446 sp:7ffd8247b0a0 error:0 in exec-port[565462ef6000+22000]
[Tue Apr  9 13:38:40 2024] traps: exec-port[21608] trap invalid opcode ip:55ed9fe4d446 sp:7fffbc44d180 error:0 in exec-port[55ed9fe46000+22000]
[Tue Apr  9 13:46:58 2024] traps: exec-port[22196] trap invalid opcode ip:562811cc9446 sp:7ffeac45abc0 error:0 in exec-port[562811cc2000+22000]
[Tue Apr  9 13:46:59 2024] traps: exec-port[22215] trap invalid opcode ip:56122d506446 sp:7ffc34d853e0 error:0 in exec-port[56122d4ff000+22000]

Screenshot 2024-04-09 at 14 57 57

Zabrane commented 4 months ago

@saleyn We found the issue: https://github.com/saleyn/erlexec/blob/master/c_src/Makefile#L72-L77

You've added AVX/AVX2 compilation flags, and our target Docker machine doesn't support them. After recompiling erlexec without these flags, it worked.

Question: why these AVX/AVX2 flags ? Do you really need SIMD instructions inside erlexec?

saleyn commented 4 months ago

This was an optimization to vectorize loops. I can take it out for compatibility.

saleyn commented 4 months ago

Please check the master branch now.

saleyn commented 4 months ago

BTW, this is how you can check the meaning of a status code:

1> exec:status(132).
{signal,sigill,true}
Zabrane commented 4 months ago

BTW, this is how you can check the meaning of a status code:

1> exec:status(132).
{signal,sigill,true}

Forget that one, thanks.

Zabrane commented 4 months ago

This was an optimization to vectorize loops. I can take it out for compatibility.

Yes please, take it off. We heavily rely on erlexec for many years.

Zabrane commented 4 months ago

Please check the master branch now.

Working as expected now. Many thanks.