unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.44k stars 680 forks source link

uWSGI 2.0.17 process got Segmentation Fault frequently after --touch-chain-reload #1807

Open huanghe314 opened 6 years ago

huanghe314 commented 6 years ago

Runtime Environment:

(nlp_env) [root@cdh2 text_mining]# which python  
/home/chenan/pyenv/nlp_env/bin/python  
(nlp_env) [root@cdh2 text_mining]# python --version  
Python 3.5.1  
(nlp_env) [root@cdh2 text_mining]# ldd `which uwsgi`  
    linux-vdso.so.1 =>  (0x00007ffef31dd000)  
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f79e805d000)  
    libm.so.6 => /lib64/libm.so.6 (0x00007f79e7d5a000)  
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f79e7b56000)  
    libz.so.1 => /lib64/libz.so.1 (0x00007f79e7940000)  
    libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f79e76dd000)  
    libssl.so.1.0.0 => /lib64/libssl.so.1.0.0 (0x00007f79e746b000)  
    libcrypto.so.1.0.0 => /lib64/libcrypto.so.1.0.0 (0x00007f79e700a000)  
    libxml2.so.2 => /lib64/libxml2.so.2 (0x00007f79e6c9f000)  
    libutil.so.1 => /lib64/libutil.so.1 (0x00007f79e6a9c000)  
    librt.so.1 => /lib64/librt.so.1 (0x00007f79e6894000)  
    libpython3.5m.so.1.0 => /usr/local/lib/libpython3.5m.so.1.0 (0x00007f79e6370000)  
    libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f79e6139000)  
    libc.so.6 => /lib64/libc.so.6 (0x00007f79e5d6c000)  
    /lib64/ld-linux-x86-64.so.2 (0x00007f79e8286000)  
    libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f79e5b1e000)  
    libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f79e5836000)  
    libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f79e5632000)  
    libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f79e53fe000)  
    liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f79e51d8000)  
    libfreebl3.so => /lib64/libfreebl3.so (0x00007f79e4fd4000)  
    libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f79e4dc6000)  
    libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f79e4bc2000)  
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f79e49a8000)  
    libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f79e4781000)  
(nlp_env) [root@cdh2 text_mining]# hostnamectl  
         Static hostname: cdh2  
         Icon name: computer-server  
         Chassis: server  
         Operating System: CentOS Linux 7 (Core)  
         CPE OS Name: cpe:/o:centos:centos:7  
         Kernel: Linux 3.10.0-327.el7.x86_64  
         Architecture: x86-64

uWSGI configuration is below

[prod]
if-env = VIRTUAL_ENV
print = Your virtualenv is %(_)
virtualenv = %(_)
endif =
http = :8679
chdir = /home/chenan/Code/text_mining/web/
wsgi-file = web_app.py
master-fifo = /tmp/text_mining
lazy-apps = true
touch-chain-reload = /home/chenan/Code/text_mining/web/touchFile
callable = app
master = 1
processes = 50
buffer-size = 8192
http-timeout = 180
enable-threads = true
harakiri = 20
listen = 1024
stats = :8686 --stats-http
pidfile = /var/run/nlp_uwsgi.pid
daemonize = /home/chenan/Code/text_mining/log/daemonize_log.log
memory-report = true
need-app = true
worker-reload-mercy = 90
max-requests = 1000
;max-worker-lifetime = 3600
threads = 5
req-logger = file:/home/chenan/Code/text_mining/log/request_log.log
log-format = %(ctime) %(method) %(uri) %(proto) %(status)

[beta]
if-env = VIRTUAL_ENV
print = Your virtualenv is %(_)
virtualenv = %(_)
endif =
http = :8679
chdir = /home/chenan/Code/text_mining/web/
wsgi-file = web_app.py
master-fifo = /tmp/text_mining
lazy-apps = true
touch-chain-reload = /home/chenan/Code/text_mining/web/touchFile
callable = app
master = 1
processes = 5
buffer-size = 8192
http-timeout = 180
enable-threads = true
harakiri = 20
listen = 1024
stats = :8686 --stats-http
pidfile = /var/run/nlp_uwsgi.pid
daemonize = /home/chenan/Code/text_mining/log/daemonize_log.log
memory-report = true
need-app = true
worker-reload-mercy = 90
max-requests = 1000
;max-worker-lifetime = 3600
threads = 5
req-logger = file:/home/chenan/Code/text_mining/log/request_log.log
log-format = %(ctime) %(method) %(uri) %(proto) %(status)

The issue occurred like below when I touch the reload file after setting --touch-chain-reload option to gracefully reload the worker, on MacOS, I used beta option, while on CentOS, I used prod option, but both scenarios have this issue.

Wed Jun 20 17:26:26 2018 - *** /home/chenan/Code/text_mining/web/touchFile has been touched... chain reload !!! ***
Wed Jun 20 17:26:26 2018 - chain next victim is worker 1  
Gracefully killing worker 1 (pid: 44682)...  
!!! uWSGI process 44682 got Segmentation Fault !!!  
*** backtrace of 44682 ***  
uwsgi(uwsgi_backtrace+0x2e) [0x46a96e]  
uwsgi(uwsgi_segfault+0x21) [0x46ad01]  
/lib64/libc.so.6(+0x362f0) [0x7f01bdde52f0]  
/home/chenan/pyenv/nlp_env/lib/python3.5/site-packages/google/protobuf/pyext/_message.cpython-35m-x86_64-linux-gnu.so(+0xba2b0) [0x7f017989e2b0]  
/usr/local/lib/libpython3.5m.so.1.0(PyDict_Clear+0x1af) [0x7f01be467fef]  
/usr/local/lib/libpython3.5m.so.1.0(+0xb5029) [0x7f01be468029]  
/usr/local/lib/libpython3.5m.so.1.0(+0x1abb32) [0x7f01be55eb32]  
/usr/local/lib/libpython3.5m.so.1.0(_PyGC_CollectNoFail+0x31) [0x7f01be55f791]  
/usr/local/lib/libpython3.5m.so.1.0(PyImport_Cleanup+0x324) [0x7f01be529294]  
/usr/local/lib/libpython3.5m.so.1.0(Py_Finalize+0x65) [0x7f01be53c055]  
uwsgi(uwsgi_plugins_atexit+0x71) [0x467b91]  
/lib64/libc.so.6(+0x39bd9) [0x7f01bdde8bd9]  
/lib64/libc.so.6(+0x39c27) [0x7f01bdde8c27]  
uwsgi() [0x42244f]  
uwsgi(gracefully_kill+0xb2) [0x469892]  
/lib64/libpthread.so.0(+0xf6d0) [0x7f01c00ab6d0]  
/lib64/libc.so.6(epoll_wait+0x33) [0x7f01bdeae183]  
uwsgi(event_queue_wait+0x23) [0x45e223]  
uwsgi(wsgi_req_accept+0xde) [0x41fdbe]  
uwsgi(simple_loop_run+0xb6) [0x466bf6]  
uwsgi(simple_loop+0xe) [0x466a2e]  
uwsgi(uwsgi_ignition+0x192) [0x46af52]  
uwsgi(uwsgi_worker_run+0x2ed) [0x46f79d]  
uwsgi() [0x46fd5f]  
uwsgi() [0x41f27e]  
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f01bddd1445]  
uwsgi() [0x41f2a9]  
*** end of backtrace ***  
worker 1 killed successfully (pid: 44682)  
Respawned uWSGI worker 1 (new pid: 46637)  
Wed Jun 20 17:26:27 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:28 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:29 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:30 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:31 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:32 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:33 2018 - chain is still waiting for worker 1...  
Wed Jun 20 17:26:34 2018 - chain is still waiting for worker 1...  
WARNING:tensorflow:From /home/chenan/pyenv/nlp_env/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.  
Instructions for updating:  
Use the retry module or similar alternatives.  
Wed Jun 20 17:26:35 2018 - chain is still waiting for worker 1...  
WSGI app 0 (mountpoint='') ready in 9 seconds on interpreter 0x1059470 pid: 46637 (default app)  
Wed Jun 20 17:26:36 2018 - chain next victim is worker 2  
Gracefully killing worker 2 (pid: 44683)...  
!!! uWSGI process 44683 got Segmentation Fault !!!

Just as see above, every time uwsgi gracefully kill a worker, a Segmentation Fault occurred, Is it normal ?

Actually, before I set --touch-chain-reload, segmentation Fault issues also exists, but not frequently.

huanghe314 commented 6 years ago

I just found that in previous configuration file, I enabled the max-worker-lifetime and set it, and every time the worker reload at max-worker-lifetime, Segmentation Fault occurred, so basically I guess the Segmentation Fault occurred almost every reload operation is operated.

huanghe314 commented 6 years ago

I found out that if I set

--skip-atexit-teardown = true
--skip-atexit = true

On MacOS, there is no Segmentation Fault anymore, but on CentOS7, same issue still exists, please help

Gabber commented 5 years ago

any news here?

brianjsw commented 5 years ago

I am also seeing this on CentOS 7 doing a systemctl restart uwsgi.