Closed amano-kenji closed 2 months ago
With epoll=false
, the following code causes segmentation fault.
(import spork/sh)
(def devnull (sh/devnull))
(ev/spawn-thread (ev/write devnull "ok"))
Welcome to hell.
I got the segfault as well.
I edited src/conf/janetconf.h
to have:
#define JANET_EV_NO_EPOLL
and rebuilt janet
to reproduce with the code in the comment above.
I don't know how reliable the results are but I used rr
to record execution and then snooped around a bit using rr replay
.
This is the output I got around the report of a segfault:
Thread 2 received signal SIGSEGV, Segmentation fault.
0x00007f05b167366e in janet_stream_close_impl (stream=<optimized out>) at src/core/ev.c:349
349 JanetStream *last = janet_vm.streams[j];
I guess that's this line.
The issue is the copying of streams between threads. There is some thread local (interpreter global) state that needs copying. This only exists for the poll implementation, though.
This should be fixed for me as of 12630d3e546cf5a1105cfabc2513d91bdfd23031 - the issue was janet_unmarshal_stream didn't re-register the stream with the event loop.
I built the latest commit with epoll=false
, and this issue can still be reproduced with the sample code above.
Is the segfault still happening? The 100% cpu is not yet addressed
At least, the segmentation fault is gone now. The 100% cpu usage is not gone, yet.
I used "TCP Echo Server" example from README as the server and janet -e '(def c (net/connect "127.0.0.1" 8000)) (:write c "abc") (ev/sleep 1e10)'
as client.
The server gets stuck in a busy loop after the client sends some data.
Janet compiled with meson setup build && ninja -C build
.
0ff8f58be87b00132e43f9a86483e5522a0d1c66 is the first bad commit. The commit is quite big unfortunately.
Server strace before commit (fd=5 - connection socket):
<...>
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN}], 3, -1
<waits for data>
Server strace after commit (fd=5 - connection socket):
<...>
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 3, -1) = 1 ([{fd=5, revents=POLLOUT}])
<...>
From man poll
:
POLLIN - There is data to read.
POLLOUT - Writing is now possible.
Janet doesn't want to write anything to the socket, yet still waits for it to be writable. Poll immediately signals that it is writable, hence the busy loop.
Found the issue, the problem is that passing a valid file descriptor to poll with events = 0 immediately returns with POLLHUP.
See https://groups.google.com/g/comp.unix.programmer/c/bNNadBIEpTo/m/G5gs1mqNhbIJ?pli=1 for a conversation and workaround.
@czkz Thanks for that secondary repro, that seems to be a closely related issue with the poll backend. This is not fixed by https://github.com/janet-lang/janet/commit/a9b8f8e8a9d75fee10a3d036854d283a9d10600e however
If I build janet with the default meson option
epoll=false
, then the following code causes a busy loop.The workaround is to replace
ev/sleep
withos/sleep
, remove{:out :pipe}
, or closeproc
withos/proc-close
which closes pipes and callsos/proc-wait
if it hasn't been called already.If I build janet with
epoll=true
, then the issue goes away.This issue was separated from https://github.com/janet-lang/janet/issues/1386