Closed DannehSC closed 7 years ago
I suppose I should have included some code to replicate it. This is code that is confirmed to have worked not that long ago. I should add, I'm on the latest of everything, luvit, luvi, lit, coro-spawn, coro-split, etc. The purpose of this script is to auto-restart and send all data to a log file.
-- example: luvit starter.lua file.lua
local fs = require('coro-fs')
local timer = require('timer')
local spawn = require('coro-spawn')
local split = require('coro-split')
local fmt = '!%Y-%m-%d %H:%M:%S'
local cmd = args[0]
args[0] = nil
table.remove(args, 1)
coroutine.wrap(function()
while true do
pcall(function()
local log = fs.open(args[1]:gsub('.lua', '.log'), 'a')
local child = spawn(cmd, {
args = args,
stdio = {nil, true, true}
})
split(function()
local line = {nil, ' ', nil}
for data in child.stdout.read do
line[1] = os.date(fmt)
line[3] = data
fs.write(log, line)
print(unpack(line))
end
end, function()
local line = {nil, ' ', nil}
for data in child.stderr.read do
line[1] = os.date(fmt)
line[3] = data
fs.write(log, line)
print(unpack(line))
end
end, child.waitExit)
fs.close(log)
timer.sleep(5000)
end)
end
end)()
What exactly doesn't work, because this works for me.
Is it possible it can't find the luvit executable? stdout and stderr never fire.
Maybe. As of commit 952160b661c8ed401171ff64f5c04171c276812c, coro-spawn will return nil, err
if the spawn fails, so you should check that out. I don't think it's been published to lit as a new version yet, so you'll have to manually update it. Maybe @creationix can bump and publish.
I'll try manual update and keep the issue open while I look, thanks.
Well. There are no errors published, I manually updated coro-spawn to the latest version. It instantly closes and doesn't read anything, when it's specifically designated to.
What does it return?
Error is nil, the process is returned.
I don't know then. You should reduce the code to something that reproduces the issue with less code, too. There's a lot here that is irrelevant to coro-spawn.
It appears to be a problem with reading the data-stream, because I can open notepad with the spawn, proving it works, but when I try to do it with luvit, and they are in the very same directory as the binaries, plus I have the cwd set to the directory with the binaries.
And after another test, the luvit apps DO launch. It's just reading related, it appears.
Seems like this might be an issue within libuv/luv, and from what I can tell it's inconsistent. Here's a simplified test case (tested on Windows):
starter.lua
-- example: luvit starter.lua file.lua
local spawn = require('coro-spawn')
local split = require('coro-split')
local cmd = args[0]
args[0] = nil
table.remove(args, 1)
coroutine.wrap(function()
local child = spawn(cmd, {
args = args,
stdio = {nil, true, true}
})
split(function()
for data in child.stdout.read do
print(data)
end
end, function()
for data in child.stderr.read do
print(data)
end
end, child.waitExit)
end)()
file.lua
local uv = require('uv')
print("hello world")
p("pretty-print test")
local stdout = uv.new_pipe(false)
uv.pipe_open(stdout, 1)
stdout:write("stdout\n")
local stderr = uv.new_pipe(false)
uv.pipe_open(stderr, 2)
stderr:write("stderr\n")
error("error")
Sometimes luvit starter.lua file.lua
will output nothing, sometimes it will output only hello world
, and sometimes it will output something like:
hello worldstdout
stderr
Uncaught exception:
...rs\Ryan\Programming\Luvit\luvit-coro-spawn-test\file.lua:15: error
stack traceback:
[C]: in function 'error'
...rs\Ryan\Programming\Luvit\luvit-coro-spawn-test\file.lua:15: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function 'require'
[string "bundle:main.lua"]:118: in function 'main'
[string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
[C]: in function 'xpcall'
[string "bundle:init.lua"]:47: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>
This same thing happens when using the code from the uv.spawn
example in the luv docs in starter.lua
.
Note, though, that this doesn't seem to happen when using uv.spawn
to create a plain Lua process (e.g. luvit starter.lua
to spawn lua file.lua
where file.lua
contains only print('hello world'); error('test')
--both the printed string and the stacktrace get read consistently), so it might be dependent on how libuv
outputs to stdout/stderr.
Quite the odd error. The more odd thing, is it used to run perfectly fine for me, then I updated everything to the latest, and everything related to that broke.
Super weird. I'm sorry I don't have time at the moment to help debug this, but I'm very interested to see what the issue it. I'll push new versions of whatever is needed when the root cause is found.
A little more investigation:
luvit spawner.lua file.lua
prints nothing when file.lua
has a syntax error, but luvit file.lua
prints the error. Also, this is only the case when the child process is luvit--using lua for the child process prints the error just fine).error
call makes the output inconsistently read (luvit starter.lua plain.lua
is inconsistent, but removing the error
call makes it consistent)luvit starter.lua file.lua
with only the print
and p
call in file.lua
makes it consistent, can also add process.stdout:write()
/process.stderr:write()
calls, but any call to uv.new_pipe
breaks it [don't even need to call pipe_open
])libuv does some weird stuff with the windows TTY (I think it turns on RAW mode for windows). It might be easier to try and debug this on a unix machine.
Probably true, but TTY is not used in child processes, so it might not have much effect in this case.
One more random thing: the exit code of luvit freaks out when error
is called in a script or when there is a syntax error. Not sure exactly why this is, but it might be somewhat relevant to the bolded point in my last comment.
When spawning a child process of lua plain.lua
>lua uv-starter.lua
uv_process_t: 0000000000312090 19692
stderr lua: plain.lua:3: '=' expected near '<eof>'
end: stdout
end: stderr
exit 1 0
When spawning a child process of luvit plain.lua
>lua uv-starter.lua
uv_process_t: 0000000000152090 23152
end: stdout
end: stderr
exit 4294967295 0
EDIT: The 4294967295 exit code is probably just due to weird type conversions--Luvit returns -1 when the script errors, and it seems that's being interpreted as a differently sized int.
Ok, I think I figured out some of what's going on, at least with regards to errors not being captured (although why the error always shows up when running Luvit directly is still confusing).
It seems like the error is not being captured because Luvit writes the error to stderr outside of the libuv event loop and then never runs the libuv event loop again except to clean up handles (of which stderr is one), so the asynchronous write call never gets processed. If I stick an explicit uv.run()
after the stderr:write()
, then the error gets captured consistently when luvit plain.lua
is spawned as a child process, where plain.lua is:
print("hello world")
error("error")
Outputs:
Current Luvit:
>lua uv-starter.lua
uv_process_t: 00000000003A2090 15660
end: stdout
end: stderr
exit 4294967295 0
After adding the uv.run()
call in Luvit:
>lua uv-starter.lua
uv_process_t: 0000000000302090 18536
stdout hello world
stderr Uncaught exception:
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: error
stack traceback:
[C]: in function 'error'
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function 'require'
[string "bundle:main.lua"]:118: in function 'main'
[string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
[C]: in function 'xpcall'
[string "bundle:init.lua"]:47: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>
end: stderr
end: stdout
exit 4294967295 0
Running luvit plain.lua
directly (uv.run
call doesn't matter for some reason, the error gets printed regardless):
>luvit plain.lua
hello world
Uncaught exception:
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: error
stack traceback:
[C]: in function 'error'
...s\Ryan\Programming\Luvit\luvit-coro-spawn-test\plain.lua:2: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function 'require'
[string "bundle:main.lua"]:118: in function 'main'
[string "bundle:init.lua"]:49: in function <[string "bundle:init.lua"]:47>
[C]: in function 'xpcall'
[string "bundle:init.lua"]:47: in function 'fn'
[string "bundle:deps/require.lua"]:310: in function <[string "bundle:deps/require.lua"]:266>
Still confused, but maybe moving in the right direction?
This may sound idiotic, but does that mean a change to luvit's init.lua
, simply adding a uv.run() in the main code, after the stderr:write(...)
could fix this?
It depends if the issue you're having is related to errors causing the output to not be captured. Probably worth trying it out.
EDIT: Note that the uv.run()
call is probably not the best fix for this, as it could have other side-effects (I think?), but it was the most straightforward for testing what was going on.
Maybe we should wait and let @creationix take a look, and see what his thoughts are on solutions for this, since it seems you have established the problem?
Yeah, would be good to get @creationix's thoughts, but there's definitely still something going on that I don't understand.
What seems to be happening is that, due to libuv's asynchronous nature, some messages aren't getting printed in certain situations (like the stderr:write
mentioned above). If that's true, then one potential fix would be to add some sort of synchronous print that could be used outside of the libuv event loop.
However, that explanation doesn't cover why those same messages are being printed when luvit is run normally (i.e. not as a libuv child process). Figuring out why there's a discrepancy would likely lead to a better fix, but I'm not sure how to go about doing that.
Something to keep in mind is that child.spawn can exit prior to the stdout/stderr buffers being read in their entirety. I bet that is the bug.
There's a few things that make that seem like an incomplete description of the problem as well:
But, yeah, that's definitely a possibility, and I'm just as unsure about how to go about ruling it out/confirming it.
@squeek502 In one of the first comments the split() function has a child.waitExit call. I think the split() call never drains the stdout+stderr handles because that coroutine doesn't get pumped until they are empty after the child.waitExit handler.
Just a hypothesis.
My guess is the pipe used as the child-process's stderr is slow enough that it's getting caught by a race condition. The linked PR flushes all streams before closing them as part of luvit's shutdown. This should prevent such cases in the future.
It appears this commit has fixed the bug! Though instant errors like print("hi") error("goodbye")
still are ignored by stderr. But if I wait even 1 ms, literally 1 ms using timer.sleep(1), then it works fine. Dunno.
@DannehSC do you have a minimal example to reproduce the "instant" error case?
Sorry, my mistake. I wasn't running the one with the update.
I get a process id and a child.handle, and it prints in the stdout and stderr to say it skipped the stdout.read and didn't continue to read until the child exited. Also, I don't know that this will help, but here's my line that uses coro-split that should process everything.
split(readstdout, readstderr, servant.waitExit)