saleyn / erlexec

Execute and control OS processes from Erlang/OTP
https://hexdocs.pm/erlexec/readme.html
Other
541 stars 141 forks source link

PTY mode STDOUT/STDERR capture issue #41

Closed dsabanin closed 10 months ago

dsabanin commented 10 years ago

When PTY mode is enabled, it looks like there some mixup happening between STDOUT and STDERR buffers. Here's how it looks like:

Cmd = "for i in `seq 1 400`; do echo $i; done",
{ok, [{stdout, Stdout}, {stderr, Stderr}]} = exec:run(Cmd, [sync,pty,stdout,stderr]).

Results in:

{ok,[{stdout,[<<"1\r\n2\r\n3\r\n">>,<<"6\r\n">>,<<"7\r\n">>,
              <<"8\r\n">>,<<"9\r\n10\r\n">>,<<"11\r\n">>,<<"12\r\n">>,
              <<"13\r\n">>,<<"14\r\n">>,<<"15\r\n">>,<<"16\r\n">>,
              <<"17\r\n">>,<<"18\r\n">>,<<"19\r\n">>,<<"21\r\n22\r\n">>,
              <<"23\r\n">>,<<"24\r\n">>,<<"25\r\n">>,<<"27\r\n">>,
              <<"30\r\n">>,<<"31\r\n">>,<<"32\r\n33\r\n">>,<<"35\r\n">>,
              <<...>>|...]},
     {stderr,[<<"4\r\n5\r\n">>,<<"20\r\n">>,<<"26\r\n">>,
              <<"28\r\n29\r\n">>,<<"34\r\n">>,<<"41\r\n">>,<<"45\r\n">>,
              <<"49\r\n">>,<<"56\r\n">>,<<"63\r\n">>,<<"68\r\n">>,
              <<"70\r\n">>,<<"77\r\n">>,<<"85\r\n">>,<<"88\r\n">>,
              <<"91\r\n">>,<<"97\r\n">>,<<"101\r\n">>,<<"111\r\n">>,
              <<"116\r\n">>,<<"120\r\n">>,<<"123\r"...>>,<<...>>|...]}]}

Please note that the command produces no STDERR at all. As far as I understand it, in pseudo TTY mode there can be no STDERR/STDOUT separation, I'm not sure why it puts some of the read from STDOUT into STDERR.

OS: Linux, Mac. erlexec version at commit 9178aec7809c33203481a2b294f9c47eabe1b810

Any help or advice on how to debug this is highly appreciated.

dsabanin commented 10 years ago

Here's also output of verbose + debug mode enabled:

https://gist.github.com/dsabanin/ae2c048db0cd13d6bb9a

saleyn commented 10 years ago

Not sure if this is somehow related to my last commit as pty seemed to have worked before.

Running without pty gives expected output:

1> exec:start([debug]).
{ok,<0.35.0>}
2> exec:run("for i in {1..400}; do echo $i; done", [stdout,stderr,sync]).
  Redirecting [stdin -> null]
  Redirecting [stdout -> pipe:{r=6,w=7}]
  Redirecting [stderr -> pipe:{r=8,w=9}]
Starting child: 'for i in {1..400}; do echo $i; done'
  child  = (stdin=null(fd:5), stdout=fd:7, stderr=fd:9)
  parent = (stdin=none, stdout=fd:6, stderr=fd:8)
  Args[0]: /bin/bash
  Args[1]: -c
  Args[2]: for i in {1..400}; do echo $i; done
  Parent closing pid 13127 pipe writing end (fd=7)
  Setup reading end of pid 13127 stdout redirection (fd=6 [non-block])
  Parent closing pid 13127 pipe writing end (fd=9)
  Setup reading end of pid 13127 stderr redirection (fd=8 [non-block])
Child process 13127 exited
* Process 13127 (ret=13127, status=0, sig=17, oktojump=1, exited_count=0 [exited])
Eof reading pid 13127's stdout, closing fd=6: No such process
Eof reading pid 13127's stderr, closing fd=8: No such process
{ok,[{stdout,[<<"1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n">>,
              <<"11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n">>,
              <<"27\n28\n29\n30\n31\n">>,<<"32\n33\n34\n35\n">>,
              <<"36\n37\n38\n39\n">>,<<"40\n41\n42\n43\n">>,
              <<"44\n45\n46\n47\n">>,<<"48\n49\n50\n51\n">>,
              <<"52\n53\n54\n55\n">>,<<"56\n57\n58\n59\n">>,
              <<"60\n61\n62\n63\n">>,<<"64\n65\n66\n67\n">>,
              <<"68\n69\n70\n71\n">>,<<"72\n73\n74\n75\n">>,
              <<"76\n77\n78\n79\n">>,<<"80\n81\n82\n83\n">>,
              <<"84\n85\n86\n87\n">>,<<"88\n89\n90\n91\n">>,
              <<"92\n93\n94\n95\n">>,<<"96\n97\n98\n99\n">>,
              <<"100\n101\n102\n"...>>,<<"104\n105\n"...>>,<<"108\n"...>>,
              <<...>>|...]}]}

Running with pty gets stuck:

3> exec:run("for i in {1..400}; do echo $i; done", [stdout,stderr,sync,pty]).
  Redirecting [stdin -> null]
Starting child: 'for i in {1..400}; do echo $i; done'
  child  = (stdin=null(fd:5), stdout=fd:7, stderr=fd:7)
  parent = (stdin=none, stdout=fd:6, stderr=fd:6)
  Args[0]: /bin/bash
  Args[1]: -c
  Args[2]: for i in {1..400}; do echo $i; done
  Parent closing pid 13128 pipe writing end (fd=7)
  Setup reading end of pid 13128 stdout redirection (fd=6 [non-block])
  Parent closing pid 13128 pipe writing end (fd=7)
  Setup reading end of pid 13128 stderr redirection (fd=6 [non-block])

I don't have time to debug this now, but what seems wrong is that the I/O redirection in the pty case is improper. You can see in the first output that the following file descriptors were created:

  Redirecting [stdin -> null]
  Redirecting [stdout -> pipe:{r=6,w=7}]
  Redirecting [stderr -> pipe:{r=8,w=9}]

and child/parent fds were assigned to this:

  child  = (stdin=null(fd:5), stdout=fd:7, stderr=fd:9)      <-- Note that fd:5 is a file descriptor of /dev/null
  parent = (stdin=none, stdout=fd:6, stderr=fd:8)

It's not happening in the pty case. The way I usually debug this - I start exec from Erlang, and then attach a gdb to the PID of the C process, set a breakpoint, resume, and execute command from Erlang. At that point the breakpoint is hit in gdb and you can step through the code. If you feel like looking into it, feel free to troubleshoot, as I won't get to this for the next 2-3 weeks.

Pty support is relatively recent, so I don't suppose it's being used by anyone in production.

saleyn commented 10 years ago

I just added missing debug logging of PTY redirect, and it does highlight the issue:

2> exec:run("for i in {1..10}; do echo $i; done", [stdout,stderr,sync]). 
  Redirecting [stdin -> null]
  Redirecting [stdout -> pipe:{r=6,w=7}]
  Redirecting [stderr -> pipe:{r=8,w=9}]
Starting child: 'for i in {1..10}; do echo $i; done'
  child  = (stdin=null(fd:5), stdout=fd:7, stderr=fd:9)
  parent = (stdin=none, stdout=fd:6, stderr=fd:8)
...
3> exec:run("for i in {1..10}; do echo $i; done", [stdout,stderr,sync,pty]).
  Redirecting [stdin -> null]
  Redirecting [stdout -> pipe:{r=6,w=7}] (PTY)
  Redirecting [stderr -> pipe:{r=6,w=7}] (PTY)
Starting child: 'for i in {1..10}; do echo $i; done'
  child  = (stdin=null(fd:5), stdout=fd:7, stderr=fd:7)
  parent = (stdin=none, stdout=fd:6, stderr=fd:6)

You can see in case of the PTY start, the stdout and stderr are both pointing to the same file descriptor. As previously mentioned, I won't be able to look into it further until in about 2-3 weeks, but if this problem for you is critical, the culprit code is in the exec.cpp:904 (start_child() function dealing with opening pty).

Here is a helpful resource on how to use a PTY: http://rachid.koucha.free.fr/tech_corner/pty_pdip.html

dsabanin commented 10 years ago

Yep, I also found the same reason. I'm still digging through the code to fix that, but my C is rusty and I don't have much experience with UNIX programming, so I may fail.

It seems that according to convention PTY should not distinguish between STDOUT/STDERR (ssh's RequestTTY does not and it's considered correct) so that's why same descriptors are used for STDOUT and STDERR. I think the easiest way to solve the problem in code is by assigning a redirect from STDERR to STDOUT, so when STDERR descriptor (but it's actually STDOUT) is being read in PTY mode, the results are sent to Erlang process as STDOUT, that way they are combined into one coherent stream.

So the workaround in Erlang is:

exec:run("for i in `seq 1 400`; do echo $i 1>&2; echo a; done", [stdout,{stderr, stdout},sync,pty]).

Output:

{ok,[{stdout,[<<"1\r\na\r\n2\r\na\r\n">>,<<"3\r\na\r\n">>,
              <<"4\r\na\r\n">>,<<"5\r\n">>,<<"a\r\n">>,<<"6\r\n">>,
              <<"a\r\n">>,<<"7\r\n">>,<<"a\r\n">>,<<"8\r\n">>,<<"a\r\n">>,
              <<"9\r\n">>,<<"a\r\n">>,<<"10\r\n">>,<<"a\r\n">>,
              <<"11\r\n">>,<<"a\r\n">>,<<"12\r\n">>,<<"a\r\n">>,
              <<"13\r\na\r\n">>,<<"14\r\n">>,<<"a\r\n">>,<<"15\r\n">>,
              <<...>>|...]}]}
saleyn commented 10 years ago

If that's the case, then exec.cpp should perform validation of stdout/stderr input options and perform error reporting if [pty,stdout,stderr] are given as well as enable stderr to stdout redirect by default. It's good that you found this issue, and we'll eventually be able to get it resolved.

dsabanin commented 10 years ago

Unfortunately I couldn't figure out the proper way to fix it, because of my lacking knowledge of C/C++, but I can confirm that the workaround of redirecting stderr to stdout works great.

saleyn commented 1 year ago

It seems to be a known issue/feature that tty multiplexes stdout and stderr into a single stream.