Open vali-um opened 1 year ago
A short excerpt of one strace we managed to capture:
Request 1:
[pid 9419] read(16<TCP:[192.168.1.174:8080->192.168.2.171:57454]>, "GET / HTTP/1.1\r\nHost: test.srv"..., 4096) = 932
...
[pid 9419] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe4ff18ba50) = 9427
...
[pid 9427] execve("/bin/nonexistent", ["/bin/nonexistent"], 0x55aa3b067e90 /* 19 vars */) = -1 ENOENT (No such file or directory)
[pid 9427] write(18<pipe:[355911]>, "\2\0\0\0", 4) = 4
[pid 9419] <... read resumed>"\2\0\0\0", 4) = 4
[pid 9427] exit_group(127 <unfinished ...>
[pid 9419] close(17<pipe:[355911]> <unfinished ...>
[pid 9427] <... exit_group resumed>) = ?
[pid 9419] <... close resumed>) = 0
[pid 9427] +++ exited with 127 +++
[pid 9419] sched_yield() = 0
[pid 9419] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=9427, si_uid=0, si_status=127, si_utime=0, si_stime=0} ---
[pid 9418] <... epoll_wait resumed>[], 1, 1000) = 0
[pid 9419] getpid( <unfinished ...>
[pid 9418] getsockopt(3<TCP:[0.0.0.0:8080]>, SOL_TCP, TCP_INFO, <unfinished ...>
[pid 9419] <... getpid resumed>) = 9419
[pid 9418] <... getsockopt resumed>"\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0d\0\0\0"..., [104]) = 0
[pid 9419] rt_sigreturn({mask=[]} <unfinished ...>
[pid 9418] wait4(-1, <unfinished ...>
[pid 9419] <... rt_sigreturn resumed>) = 0
....
Request 1 doesn't finish.
Request 2:
[pid 9423] read(17<TCP:[192.168.1.174:8080->192.168.2.171:57470]>, "GET / HTTP/1.1\r\nHost: test.srv"..., 4096) = 889
...
[pid 9423] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe4fb0e69d0) = 9429
....
[pid 9429] execve("/bin/nonexistent", ["/bin/nonexistent"], 0x55aa3b067e90 /* 19 vars */) = -1 ENOENT (No such file or directory)
[pid 9429] write(19<pipe:[355929]>, "\2\0\0\0", 4) = 4
[pid 9423] <... read resumed>"\2\0\0\0", 4) = 4
[pid 9429] exit_group(127 <unfinished ...>
[pid 9423] close(18<pipe:[355929]> <unfinished ...>
[pid 9429] <... exit_group resumed>) = ?
[pid 9423] <... close resumed>) = 0
[pid 9429] +++ exited with 127 +++
[pid 9423] futex(0x55aa3b0467c0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 9419] <... ppoll resumed>) = ? ERESTARTNOHAND (To be restarted if no handler)
[pid 9419] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=9429, si_uid=0, si_status=127, si_utime=0, si_stime=0} ---
[pid 9419] getpid() = 9419
[pid 9419] rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
[pid 9419] read(4<anon_inode:[eventfd]>, 0x7fff41be88d0, 8) = -1 EAGAIN (Resource temporarily unavailable)
[pid 9419] wait4(9429, [{WIFEXITED(s) && WEXITSTATUS(s) == 127}], WNOHANG, NULL) = 9429
[pid 9419] futex(0x55aa3b0467c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 9423] <... futex resumed>) = 0
[pid 9419] <... futex resumed>) = 1
[pid 9423] futex(0x55aa3b0467f0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 9419] futex(0x55aa3b0467f0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 9423] <... futex resumed>) = -1 EAGAIN (Resource temporarily unavailable)
[pid 9419] <... futex resumed>) = 0
[pid 9423] futex(0x55aa3b0467f0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 9419] wait4(9427, <unfinished ...>
[pid 9423] <... futex resumed>) = 0
[pid 9419] <... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 127}], WNOHANG, NULL) = 9427
[pid 9423] futex(0x55aa3af5d1e0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 9419] write(4<anon_inode:[eventfd]>, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 9419] futex(0x55aa3af5d1e0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 9423] <... futex resumed>) = 0
[pid 9419] <... futex resumed>) = 1
[pid 9423] futex(0x55aa3af5d1e0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 9419] clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
[pid 9423] <... futex resumed>) = 0
[pid 9419] <... clock_gettime resumed>{tv_sec=1144088, tv_nsec=344160939}) = 0
[pid 9423] writev(17<TCP:[192.168.1.174:8080->192.168.2.171:57470]>, [{iov_base="HTTP/1.1 200 OK\r\nContent-Type: t"..., iov_len=44}, {iov_base="error", iov_len=5}], 2 <unfinished ...>
[pid 9419] futex(0x55aa3af4dd10, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1144088, tv_nsec=444160939}, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 9423] <... writev resumed>) = 49
[pid 9423] gettimeofday({tv_sec=1677602195, tv_usec=936455}, NULL) = 0
[pid 9423] close(17<TCP:[192.168.1.174:8080->192.168.2.171:57470]>) = 0
[pid 9423] writev(2</root/uwsgi-rack-race/uwsgi_race_full.strace2>, [{iov_base="[pid: 9419|app: 0|req: 2/1] 10.4"..., iov_len=193}], 1[pid: 9419|app: 0|req: 2/1] 192.168.2.171 () {36 vars in 1090 bytes} [Tue Feb 28 16:36:35 2023] GET / => generated 5 bytes in 23 msecs (HTTP/1.1 200) 1 headers in 44 bytes (0 switches on core 4)
) = 193
[pid 9423] futex(0x55aa3af4dd10, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 9419] <... futex resumed>) = 0
[pid 9423] futex(0x55aa3a24e978, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 9419] futex(0x55aa3af5d110, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 9419] writev(16<TCP:[192.168.1.174:8080->192.168.2.171:57454]>, [{iov_base="HTTP/1.1 200 OK\r\nContent-Type: t"..., iov_len=44}, {iov_base="error", iov_len=5}], 2) = 49
[pid 9419] gettimeofday({tv_sec=1677602195, tv_usec=958538}, NULL) = 0
[pid 9419] close(16<TCP:[192.168.1.174:8080->192.168.2.171:57454]>) = 0
[pid 9419] writev(2</root/uwsgi-rack-race/uwsgi_race_full.strace2>, [{iov_base="[pid: 9419|app: 0|req: 2/2] 10.4"..., iov_len=195}], 1[pid: 9419|app: 0|req: 2/2] 192.168.2.171 () {40 vars in 1143 bytes} [Tue Feb 28 16:36:29 2023] GET / => generated 5 bytes in 6431 msecs (HTTP/1.1 200) 1 headers in 44 bytes (0 switches on core 0)
) = 195
Request 2 completes, and immediately afterwards request 1
Attached is the full strace: uwsgi_race_full.strace.txt
Hi,
we're having some troubles with uwsgi (2.0.20/2.0.21), ruby (2.7/3.1) and redmine (5.0.4) regarding threads (master mode in uwsgi): calls to popen, backticks and shell with non-existing binaries fail and hang indefinitely (the shell stays in the process list as a zombie process). Redmine uses this to determine available tools.
Some example calls may be found here:
sleep 1
after this line)exit 0
)There are also some ruby bugs that seem to be relevant here:
We created a minimal rack app to reproduce this behaviour:
Steps to reproduce with uwsgi + rack plugin:
For us the issue arises in the minimal app if the request is made directly after uwsgi startup.
Result on our test machine: The request hangs, ps shows an zombie process (
root 11109 0.0 0.0 0 0 ? Z 16:07 0:00 [uwsgi] <defunct>
). A second request succeeds, immediately afterwards the first request also finishes.Steps to reproduce with uwsgi + rack & rbthreads plugin:
Running uwsgi
Result on our test machine: The first request works as expected. Subsequent requests will hang until the 5th request (= thread limit per worker +1). After the 5th request all still running requests finish.
We were not sure where to report this issue. Any suggestions or help would be highly appreciated.