Closed dstndstn closed 2 years ago
Sorry, forgot to say: upon restarting uwsgi (service nova-uwsgi restart
), things work fine again.
And with --show-config, the log file starts with:
*** Starting uWSGI 2.0.20 (64bit) on [Tue May 31 14:05:48 2022] ***
compiled with version: 9.3.0 on 18 January 2022 15:56:23
os: Linux-5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022
nodename: nova
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 16
current working directory: /home/nova/astrometry/net
detected binary path: /usr/local/bin/uwsgi
your processes number limit is 482928
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :3030 fd 4
Python version: 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x5557afda4680
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 656136 bytes (640 KB) for 8 cores
*** Operational MODE: preforking ***
Path: /home/nova/astrometry
Logging to /data/nova/nova.log
WSGI app 0 (mountpoint='') ready in 17 seconds on interpreter 0x5557afda4680 pid: 2597129 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 2597129)
spawned uWSGI worker 1 (pid: 2597149, cores: 1)
spawned uWSGI worker 2 (pid: 2597150, cores: 1)
spawned uWSGI worker 3 (pid: 2597151, cores: 1)
spawned uWSGI worker 4 (pid: 2597152, cores: 1)
spawned uWSGI worker 5 (pid: 2597153, cores: 1)
spawned uWSGI worker 6 (pid: 2597154, cores: 1)
spawned uWSGI worker 7 (pid: 2597155, cores: 1)
spawned uWSGI worker 8 (pid: 2597156, cores: 1)
*** Stats server enabled on 127.0.0.1:1717 fd: 24 ***
[pid: 2597152|app: 0|req: 1/2] 99.250.167.114 () {76 vars in 1374 bytes} [Tue May 31 14:06:05 2022] GET / => generated 14924 bytes in 1398 msecs (HTTP/1.1 200) 7 headers in 223 bytes (1 switches on core 0
)
and I guess I should mention that this is a Python/Django site.
Thanks!
When this happened again today, I tried attaching gdb to the uwsgi processes, and it's showing here:
#0 __libc_read (nbytes=5, buf=0x56257206f723, fd=10) at ../sysdeps/unix/sysv/linux/read.c:26
#1 __libc_read (fd=10, buf=0x56257206f723, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:24
#2 0x00007fc851a683d9 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#3 0x00007fc851a6367e in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#4 0x00007fc851a624d4 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#5 0x00007fc851a62aa7 in BIO_read () from /lib/x86_64-linux-gnu/libcrypto.so.1.1
#6 0x00007fc851cb4b91 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#7 0x00007fc851cb8e1e in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#8 0x00007fc851cb66d0 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#9 0x00007fc851ce9a31 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007fc851cdf625 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x00007fc851ccad98 in SSL_do_handshake () from /lib/x86_64-linux-gnu/libssl.so.1.1
#12 0x00007fc851d3b04c in ?? () from /usr/lib/python3.8/lib-dynload/_ssl.cpython-38-x86_64-linux-gnu.so
#13 0x00007fc852abc25b in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#14 0x00007fc85288fd6d in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#15 0x00007fc852891018 in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#16 0x00007fc8529e5e3b in _PyEval_EvalCodeWithName () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#17 0x00007fc852ac3114 in _PyFunction_Vectorcall () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
...
#207 0x00007fc852ac5407 in _PyObject_FastCallDict () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#208 0x00007fc852ac551d in _PyObject_Call_Prepend () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#209 0x00007fc852a5fbe6 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#210 0x00007fc852ac3984 in PyObject_Call () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0
#211 0x000056256dfa2304 in python_call ()
#212 0x000056256dfa4705 in uwsgi_request_wsgi ()
#213 0x000056256df3ebbd in wsgi_req_recv ()
#214 0x000056256df89d74 in simple_loop_run ()
#215 0x000056256df89b54 in simple_loop ()
#216 0x000056256df8e5f4 in uwsgi_ignition ()
#217 0x000056256df92d16 in uwsgi_worker_run ()
#218 0x000056256df93294 in uwsgi_run ()
#219 0x000056256df3e224 in main ()
Doesn't that seem weird? It's doing an SSL handshake??
Looking at lsof
for that process, I see
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
uwsgi 2631165 nova 0r CHR 1,3 0t0 6 /dev/null
uwsgi 2631165 nova 1u REG 252,32 86586098 270095362 /data/nova/uwsgi.log
uwsgi 2631165 nova 2u REG 252,32 86586098 270095362 /data/nova/uwsgi.log
uwsgi 2631165 nova 3u unix 0xffff91454be73400 0t0 25639965 type=DGRAM
uwsgi 2631165 nova 4u IPv4 25630232 0t0 TCP *:3030 (LISTEN)
uwsgi 2631165 nova 5u a_inode 0,14 0 10511 [eventpoll]
uwsgi 2631165 nova 6u sock 0,9 0t0 28514896 protocol: TCP
uwsgi 2631165 nova 7w REG 252,32 1334720725 270073863 /data/nova/nova.log
uwsgi 2631165 nova 8u IPv4 28514897 0t0 TCP localhost:59832->localhost:postgresql (ESTABLISHED)
uwsgi 2631165 nova 9u REG 252,32 0 258163796 /data/tmp/tmp7d2xj9aj
uwsgi 2631165 nova 10u IPv4 28514901 0t0 TCP nova.arbutus:39176->101.70.154.119:https (ESTABLISHED)
uwsgi 2631165 nova 20u unix 0xffff9145a7125800 0t0 25630298 type=STREAM
uwsgi 2631165 nova 22u unix 0xffff9145a7125c00 0t0 25630303 type=STREAM
and fd 10 is an outgoing connection from my server (nova) on https...?
Time to set up a cron job to restart uwsgi every day...
(Meanwhile, the main process is here:
(gdb) where
#0 0x00007fc85274846e in epoll_wait (epfd=23, events=0x7ffc348661bc, maxevents=1, timeout=1000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1 0x000056256df80f19 in event_queue_wait ()
#2 0x000056256df53116 in master_loop ()
#3 0x000056256df930ea in uwsgi_run ()
#4 0x000056256df3e224 in main ()
and its lsof
shows
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
uwsgi 2631137 nova 0r CHR 1,3 0t0 6 /dev/null
uwsgi 2631137 nova 1u REG 252,32 86639558 270095362 /data/nova/uwsgi.log
uwsgi 2631137 nova 2u REG 252,32 86639558 270095362 /data/nova/uwsgi.log
uwsgi 2631137 nova 3u unix 0xffff91454be73400 0t0 25639965 type=DGRAM
uwsgi 2631137 nova 4u IPv4 25630232 0t0 TCP *:3030 (LISTEN)
uwsgi 2631137 nova 5u unix 0xffff9145a7123c00 0t0 25630283 type=STREAM
uwsgi 2631137 nova 6u unix 0xffff9145a7126400 0t0 25630284 type=STREAM
uwsgi 2631137 nova 7u unix 0xffff9145a7127000 0t0 25630285 type=STREAM
uwsgi 2631137 nova 8u unix 0xffff9145a7121c00 0t0 25630286 type=STREAM
uwsgi 2631137 nova 9u unix 0xffff9145a7126000 0t0 25630287 type=STREAM
uwsgi 2631137 nova 10u unix 0xffff9145a7121400 0t0 25630288 type=STREAM
uwsgi 2631137 nova 11u unix 0xffff9145a7123400 0t0 25630289 type=STREAM
uwsgi 2631137 nova 12u unix 0xffff9145a7124c00 0t0 25630290 type=STREAM
uwsgi 2631137 nova 13u unix 0xffff9145a7122c00 0t0 25630291 type=STREAM
uwsgi 2631137 nova 14u unix 0xffff9145a7121800 0t0 25630292 type=STREAM
uwsgi 2631137 nova 15u unix 0xffff9145a7120000 0t0 25630293 type=STREAM
uwsgi 2631137 nova 16u unix 0xffff9145a7122400 0t0 25630294 type=STREAM
uwsgi 2631137 nova 17u unix 0xffff9145a7126800 0t0 25630295 type=STREAM
uwsgi 2631137 nova 18u unix 0xffff9145a7124400 0t0 25630296 type=STREAM
uwsgi 2631137 nova 19u unix 0xffff9145a7123800 0t0 25630297 type=STREAM
uwsgi 2631137 nova 20u unix 0xffff9145a7125800 0t0 25630298 type=STREAM
uwsgi 2631137 nova 21u unix 0xffff9145a7124000 0t0 25630302 type=STREAM
uwsgi 2631137 nova 22u unix 0xffff9145a7125c00 0t0 25630303 type=STREAM
uwsgi 2631137 nova 23u a_inode 0,14 0 10511 [eventpoll]
uwsgi 2631137 nova 24u IPv4 25609936 0t0 TCP localhost:1717 (LISTEN)
Apologies! This was me DOS'ing myself by, inadvertently, having my code urlopen() a URL passed in by the client. Turns out that's not a good idea.
Hi,
I'm running an Ubuntu 20.04 on an OpenStack VM, Intel broadwell CPU. Apache 2.4. Uwsgi 2.0.20.
uwsgi run (in a script that is run by systemd, as a normal user):
After a day or two on my ~busy site (few requests/second), I am getting:
repeated. In
top
, I don't see any uWSGI processes using CPU, and I think they're in state "S" in ps. It doesn't look like it's busy doing anything, it looks like it's just stalled.Previously, I was getting dmesg entries about CPU soft lock-ups -- flaky hardware underlying the VM? -- but in today's edition of this problem I don't think I see that.
Looking in the Apache logs, things seem normal leading up to this time, and then I start seeing a bunch of requests that end with http status 500 after 300 seconds.
I'm feeling lost about what to look at to figure out what is going on. I'm still not 100% sure it's not hardware-related. One odd thing I'm seeing is that the process ids are large, like the current uwsgi process is pid 2618919. Maybe that's just life on a working server, but it seemed unusual.
Any hints on things to look into, stats to collect, logging to do, things to investigate next time this happens, would be very much appreciated!
Thank you, dustin