Coro-spawn stdout and stderr not reading.

DannehSC commented 7 years ago

I get a process id and a child.handle, and it prints in the stdout and stderr to say it skipped the stdout.read and didn't continue to read until the child exited. Also, I don't know that this will help, but here's my line that uses coro-split that should process everything. split(readstdout, readstderr, servant.waitExit)

DannehSC commented 7 years ago

I suppose I should have included some code to replicate it. This is code that is confirmed to have worked not that long ago. I should add, I'm on the latest of everything, luvit, luvi, lit, coro-spawn, coro-split, etc. The purpose of this script is to auto-restart and send all data to a log file.


-- example: luvit starter.lua file.lua
local fs = require('coro-fs')
local timer = require('timer')
local spawn = require('coro-spawn')
local split = require('coro-split')
local fmt = '!%Y-%m-%d %H:%M:%S'
local cmd = args[0]
args[0] = nil
table.remove(args, 1)
coroutine.wrap(function()
    while true do
        pcall(function()
            local log = fs.open(args[1]:gsub('.lua', '.log'), 'a')
            local child = spawn(cmd, {
                args = args,
                stdio = {nil, true, true}
            })
            split(function()
                local line = {nil, ' ', nil}
                for data in child.stdout.read do
                    line[1] = os.date(fmt)
                    line[3] = data
                    fs.write(log, line)
                    print(unpack(line))
                end
            end, function()
                local line = {nil, ' ', nil}
                for data in child.stderr.read do
                    line[1] = os.date(fmt)
                    line[3] = data
                    fs.write(log, line)
                    print(unpack(line))
                end
            end, child.waitExit)
            fs.close(log)
            timer.sleep(5000)
        end)
    end
end)()

SinisterRectus commented 7 years ago

What exactly doesn't work, because this works for me.

DannehSC commented 7 years ago

Is it possible it can't find the luvit executable? stdout and stderr never fire.

SinisterRectus commented 7 years ago

Maybe. As of commit 952160b661c8ed401171ff64f5c04171c276812c, coro-spawn will return nil, err if the spawn fails, so you should check that out. I don't think it's been published to lit as a new version yet, so you'll have to manually update it. Maybe @creationix can bump and publish.

DannehSC commented 7 years ago

I'll try manual update and keep the issue open while I look, thanks.

DannehSC commented 7 years ago

Well. There are no errors published, I manually updated coro-spawn to the latest version. It instantly closes and doesn't read anything, when it's specifically designated to.

SinisterRectus commented 7 years ago

What does it return?

DannehSC commented 7 years ago

Error is nil, the process is returned.

SinisterRectus commented 7 years ago

I don't know then. You should reduce the code to something that reproduces the issue with less code, too. There's a lot here that is irrelevant to coro-spawn.

DannehSC commented 7 years ago

It appears to be a problem with reading the data-stream, because I can open notepad with the spawn, proving it works, but when I try to do it with luvit, and they are in the very same directory as the binaries, plus I have the cwd set to the directory with the binaries.

DannehSC commented 7 years ago

And after another test, the luvit apps DO launch. It's just reading related, it appears.

squeek502 commented 7 years ago

Seems like this might be an issue within libuv/luv, and from what I can tell it's inconsistent. Here's a simplified test case (tested on Windows):

starter.lua

-- example: luvit starter.lua file.lua
local spawn = require('coro-spawn')
local split = require('coro-split')
local cmd = args[0]
args[0] = nil
table.remove(args, 1)
coroutine.wrap(function()
    local child = spawn(cmd, {
        args = args,
        stdio = {nil, true, true}
    })
    split(function()
        for data in child.stdout.read do
            print(data)
        end
    end, function()
        for data in child.stderr.read do
            print(data)
        end
    end, child.waitExit)
end)()

file.lua

local uv = require('uv')

print("hello world")

p("pretty-print test")

local stdout = uv.new_pipe(false)
uv.pipe_open(stdout, 1)
stdout:write("stdout\n")

local stderr = uv.new_pipe(false)
uv.pipe_open(stderr, 2)
stderr:write("stderr\n")

error("error")

Sometimes luvit starter.lua file.lua will output nothing, sometimes it will output only hello world, and sometimes it will output something like:

hello worldstdout

stderr
Uncaught exception:
...rs\Ryan\Programming\Luvit\luvit-coro-spawn-test\file.lua:15: error
stack traceback:
        [C]: in function 'error'
        ...rs\Ryan\Programming\Luvit\luvit-coro-spawn-test\file.lua:15: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function 'require'
        [string "bundle:main.lua"]:118: in function 'main'
        [string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
        [C]: in function 'xpcall'
        [string "bundle:init.lua"]:47: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>

This same thing happens when using the code from the uv.spawn example in the luv docs in starter.lua.

Note, though, that this doesn't seem to happen when using uv.spawn to create a plain Lua process (e.g. luvit starter.lua to spawn lua file.lua where file.lua contains only print('hello world'); error('test')--both the printed string and the stacktrace get read consistently), so it might be dependent on how libuv outputs to stdout/stderr.

DannehSC commented 7 years ago

Quite the odd error. The more odd thing, is it used to run perfectly fine for me, then I updated everything to the latest, and everything related to that broke.

creationix commented 7 years ago

Super weird. I'm sorry I don't have time at the moment to help debug this, but I'm very interested to see what the issue it. I'll push new versions of whatever is needed when the root cause is found.

squeek502 commented 7 years ago

A little more investigation:

A syntax error in the child process' lua file prevents any output from being read (e.g. luvit spawner.lua file.lua prints nothing when file.lua has a syntax error, but luvit file.lua prints the error. Also, this is only the case when the child process is luvit--using lua for the child process prints the error just fine).
An error call makes the output inconsistently read (luvit starter.lua plain.lua is inconsistent, but removing the error call makes it consistent)
Opening another stdout/stderr pipe makes the output inconsistently read (luvit starter.lua file.lua with only the print and p call in file.lua makes it consistent, can also add process.stdout:write()/process.stderr:write() calls, but any call to uv.new_pipe breaks it [don't even need to call pipe_open])
I haven't figured out a way to reproduce this with plain libuv (although I don't understand enough about what's happening yet to even make a legitimate effort in this regard).

rphillips commented 7 years ago

libuv does some weird stuff with the windows TTY (I think it turns on RAW mode for windows). It might be easier to try and debug this on a unix machine.

squeek502 commented 7 years ago

Probably true, but TTY is not used in child processes, so it might not have much effect in this case.

One more random thing: the exit code of luvit freaks out when error is called in a script or when there is a syntax error. Not sure exactly why this is, but it might be somewhat relevant to the bolded point in my last comment.

When spawning a child process of lua plain.lua

>lua uv-starter.lua
uv_process_t: 0000000000312090  19692
stderr  lua: plain.lua:3: '=' expected near '<eof>'

end: stdout
end: stderr
exit    1       0

When spawning a child process of luvit plain.lua

>lua uv-starter.lua
uv_process_t: 0000000000152090  23152
end: stdout
end: stderr
exit    4294967295      0

EDIT: The 4294967295 exit code is probably just due to weird type conversions--Luvit returns -1 when the script errors, and it seems that's being interpreted as a differently sized int.

squeek502 commented 7 years ago

Ok, I think I figured out some of what's going on, at least with regards to errors not being captured (although why the error always shows up when running Luvit directly is still confusing).

It seems like the error is not being captured because Luvit writes the error to stderr outside of the libuv event loop and then never runs the libuv event loop again except to clean up handles (of which stderr is one), so the asynchronous write call never gets processed. If I stick an explicit uv.run() after the stderr:write(), then the error gets captured consistently when luvit plain.lua is spawned as a child process, where plain.lua is:

print("hello world")
error("error")

uv-starter.lua for reference, used to spawn the child process (ran with `luv` built as a Lua module)

```lua local uv = require('uv') local stdout = uv.new_pipe(false) local stderr = uv.new_pipe(false) local handle, pid local function onexit(code, signal) print("exit", code, signal) end local function onreadout(err, chunk) assert(not err, err) if (chunk) then print("stdout", chunk) else print("end: stdout") end end local function onreaderr(err, chunk) assert(not err, err) if (chunk) then print("stderr", chunk) else print("end: stderr") end end local cmd = "luvit" handle, pid = uv.spawn(cmd, { args = {"plain.lua"}, stdio = {nil, stdout, stderr} }, onexit) print(handle, pid) uv.read_start(stdout, onreadout) uv.read_start(stderr, onreaderr) uv.run() ```

Outputs:

Current Luvit:

>lua uv-starter.lua
uv_process_t: 00000000003A2090  15660
end: stdout
end: stderr
exit    4294967295      0

After adding the uv.run() call in Luvit:

>lua uv-starter.lua
uv_process_t: 0000000000302090  18536
stdout  hello world

stderr  Uncaught exception:
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: error
stack traceback:
        [C]: in function 'error'
        ...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function 'require'
        [string "bundle:main.lua"]:118: in function 'main'
        [string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
        [C]: in function 'xpcall'
        [string "bundle:init.lua"]:47: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>

end: stderr
end: stdout
exit    4294967295      0

Running luvit plain.lua directly (uv.run call doesn't matter for some reason, the error gets printed regardless):

>luvit plain.lua
hello world
Uncaught exception:
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: error
stack traceback:
        [C]: in function 'error'
        ...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function 'require'
        [string "bundle:main.lua"]:118: in function 'main'
        [string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
        [C]: in function 'xpcall'
        [string "bundle:init.lua"]:47: in function 'fn'
        [string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>

Still confused, but maybe moving in the right direction?

DannehSC commented 7 years ago

This may sound idiotic, but does that mean a change to luvit's init.lua, simply adding a uv.run() in the main code, after the stderr:write(...) could fix this?

squeek502 commented 7 years ago

It depends if the issue you're having is related to errors causing the output to not be captured. Probably worth trying it out.

EDIT: Note that the uv.run() call is probably not the best fix for this, as it could have other side-effects (I think?), but it was the most straightforward for testing what was going on.

DannehSC commented 7 years ago

Maybe we should wait and let @creationix take a look, and see what his thoughts are on solutions for this, since it seems you have established the problem?

squeek502 commented 7 years ago

Yeah, would be good to get @creationix's thoughts, but there's definitely still something going on that I don't understand.

What seems to be happening is that, due to libuv's asynchronous nature, some messages aren't getting printed in certain situations (like the stderr:write mentioned above). If that's true, then one potential fix would be to add some sort of synchronous print that could be used outside of the libuv event loop.

However, that explanation doesn't cover why those same messages are being printed when luvit is run normally (i.e. not as a libuv child process). Figuring out why there's a discrepancy would likely lead to a better fix, but I'm not sure how to go about doing that.

rphillips commented 7 years ago

Something to keep in mind is that child.spawn can exit prior to the stdout/stderr buffers being read in their entirety. I bet that is the bug.

squeek502 commented 7 years ago

There's a few things that make that seem like an incomplete description of the problem as well:

It only seems to happen when spawning luvit specifically. As mentioned previously, when spawning lua as a child process, it captures the full output every time (even though the same file is being executed by the lua/luvit child processes).
If it is true that the child process is exiting before the output is fully read, then as far as I can tell, that'd be a libuv bug, as the inconsistent output is reproducible with plain libuv (code I used in a detail block below, it's a modified version of this test)

libuv child process stdio test code

```c++ /* Copyright Joyent, Inc. and other Node contributors. All rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or * sell copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. */ #include "uv.h" #include #include #include #ifndef NDEBUG #include #define ASSERT(condition) \ { \ if(!(condition)) \ { \ std::cerr << "Assertion failed at " << __FILE__ << ":" << __LINE__; \ std::cerr << " inside " << __FUNCTION__ << std::endl; \ std::cerr << "Condition: " << #condition << std::endl; \ abort(); \ } \ } #else #define ASSERT(condition) (condition) #endif #define UV_STDIN_FD 0 #define UV_STDOUT_FD 1 static char exepath[1024]; static size_t exepath_size = 1024; static char* args[3]; static uv_process_options_t options; static int close_cb_called; static int exit_cb_called; static int on_read_cb_called; static int after_write_cb_called; static uv_pipe_t in; static uv_pipe_t out; static uv_pipe_t err; static uv_loop_t* loop; #define OUTPUT_SIZE 1024 static char output[OUTPUT_SIZE]; static int output_used; static void close_cb(uv_handle_t* handle) { close_cb_called++; } static void exit_cb(uv_process_t* process, int64_t exit_status, int term_signal) { printf("exit_cb %d\n", exit_status); exit_cb_called++; uv_close((uv_handle_t*)process, close_cb); uv_close((uv_handle_t*)&in, close_cb); uv_close((uv_handle_t*)&out, close_cb); uv_close((uv_handle_t*)&err, close_cb); } static void init_process_options(char* test, uv_exit_cb exit_cb) { strcpy_s(exepath, "luvit"); printf("%s\n", exepath); args[0] = exepath; args[1] = "C:\\Users\\Ryan\\Programming\\Luvit\\luvit-coro-spawn-test\\plain.lua"; args[2] = NULL; options.file = exepath; options.args = args; options.exit_cb = exit_cb; } static void on_alloc(uv_handle_t* handle, size_t suggested_size, uv_buf_t* buf) { buf->base = output + output_used; buf->len = OUTPUT_SIZE - output_used; } static void after_write(uv_write_t* req, int status) { if (status) { fprintf(stderr, "uv_write error: %s\n", uv_strerror(status)); ASSERT(0); } /* Free the read/write buffer and the request */ free(req); after_write_cb_called++; } static void on_read(uv_stream_t* tcp, ssize_t nread, const uv_buf_t* rdbuf) { ASSERT(nread > 0 || nread == UV_EOF); if (nread > 0) { output_used += nread; } on_read_cb_called++; } int main(int argc, const char * argv[]) { int r; uv_process_t process; uv_stdio_container_t stdio[2]; loop = uv_default_loop(); init_process_options("stdio_over_pipes_helper", exit_cb); uv_pipe_init(loop, &out, 0); uv_pipe_init(loop, &in, 0); uv_pipe_init(loop, &err, 0); options.stdio = stdio; options.stdio[0].flags = (uv_stdio_flags)(UV_CREATE_PIPE | UV_READABLE_PIPE); options.stdio[0].data.stream = (uv_stream_t*)∈ options.stdio[1].flags = (uv_stdio_flags)(UV_CREATE_PIPE | UV_WRITABLE_PIPE); options.stdio[1].data.stream = (uv_stream_t*)&out; options.stdio[2].flags = (uv_stdio_flags)(UV_CREATE_PIPE | UV_WRITABLE_PIPE); options.stdio[2].data.stream = (uv_stream_t*)&err; options.stdio_count = 3; r = uv_spawn(loop, &process, &options); ASSERT(r == 0); r = uv_read_start((uv_stream_t*)&out, on_alloc, on_read); ASSERT(r == 0); r = uv_read_start((uv_stream_t*)&err, on_alloc, on_read); ASSERT(r == 0); r = uv_run(uv_default_loop(), UV_RUN_DEFAULT); ASSERT(r == 0); printf("%s\n", output); return 0; } ```

But, yeah, that's definitely a possibility, and I'm just as unsure about how to go about ruling it out/confirming it.

rphillips commented 7 years ago

@squeek502 In one of the first comments the split() function has a child.waitExit call. I think the split() call never drains the stdout+stderr handles because that coroutine doesn't get pumped until they are empty after the child.waitExit handler.

Just a hypothesis.

creationix commented 7 years ago

My guess is the pipe used as the child-process's stderr is slow enough that it's getting caught by a race condition. The linked PR flushes all streams before closing them as part of luvit's shutdown. This should prevent such cases in the future.

DannehSC commented 7 years ago

It appears this commit has fixed the bug! Though instant errors like print("hi") error("goodbye") still are ignored by stderr. But if I wait even 1 ms, literally 1 ms using timer.sleep(1), then it works fine. Dunno.

creationix commented 7 years ago

@DannehSC do you have a minimal example to reproduce the "instant" error case?

DannehSC commented 7 years ago

Sorry, my mistake. I wasn't running the one with the update.

luvit / lit

Coro-spawn stdout and stderr not reading. #222