unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.45k stars 691 forks source link

Can't restart application manually while it fails with need-app in emperor mode #1148

Closed Shir0kamii closed 8 years ago

Shir0kamii commented 8 years ago

Hi,

We recently had a problem with restarting an application while it is failing with need-app enabled. The emperor keep reloading the application, but it does so with more waiting time after each fail. And during this waiting time, we can't seem to find a way to restart the application, apart from restarting uWSGI.

This behavior has been observed in 2.0.11-r1 and 2.0.12 but hasn't been tested on older versions.

Here is the configuration file for my test case :

[uwsgi]
master = True
need-app = True
socket = localhost:9872
wsgi-file = test.py
touch-reload = VERSION

and test.py :

raise Exception

I got the following logs over and over again (only the part after "starting uWSGI Emperor" repeat) :

[uwsgi] implicit plugin requested python27
*** Starting uWSGI 2.0.12 (64bit) on [Wed Jan  6 19:11:48 2016] ***
compiled with version: 5.3.0 on 06 January 2016 19:10:58
os: Linux-4.3.0-gentoo #1 SMP Tue Nov 17 14:10:41 CET 2015
nodename: Noctuide
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/alexandre.bonnetain/temp/uwsgi
detected binary path: /usr/bin/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 47995
your memory page size is 4096 bytes
detected max file descriptor number: 1024
*** starting uWSGI Emperor ***
*** has_emperor mode detected (fd: 6) ***
[uwsgi] implicit plugin requested python27
[uWSGI] getting INI configuration from test.ini
*** Starting uWSGI 2.0.12 (64bit) on [Wed Jan  6 19:11:48 2016] ***
compiled with version: 5.3.0 on 06 January 2016 19:10:58
os: Linux-4.3.0-gentoo #1 SMP Tue Nov 17 14:10:41 CET 2015
nodename: Noctuide
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/alexandre.bonnetain/temp/uwsgi
detected binary path: /usr/bin/uwsgi
your processes number limit is 47995
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address localhost:9872 fd 3
Python version: 2.7.11 (default, Dec 28 2015, 10:25:59)  [GCC 4.9.3]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x7280c0
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 145536 bytes (142 KB) for 1 cores
*** Operational MODE: single process ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 17514)
Wed Jan  6 19:11:48 2016 - [emperor] vassal test.ini has been spawned
spawned uWSGI worker 1 (pid: 17515, cores: 1)
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    raise Exception
Exception
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. GAME OVER ***
SIGINT/SIGQUIT received...killing workers...
worker 1 buried after 1 seconds
goodbye to uWSGI.

And I launch the emperor with uwsgi_python27 --emperor=. (uwsgi_python27 is a binary specific to gentoo I think. It implicitly ask for python27 plugin)

I've tried to touch VERSION while the Emperor was waiting but it didn't work. On IRC, damjan told me that kill -HUP <uwsgi_pid> should work but it didn't either. It wouldn't be a problem if the waiting time of the emperor was low, but it seems it can take 30s and more if the application keep failing for several minutes and

Also, we had a freeze but didn't find a way to reproduce it. The application stopped logging in the middle of a traceback and the emperor didn't try to restart its vassal afterward. This is worse than a long waiting time since we can't deploy automatically when this happens.

Is there a workaround for this ? If there is none and If someone can point me out where the relevant code is (I searched in core/emperor.c but didn't find it), I'm willing to work on the problem.

unbit commented 8 years ago

discussion moved here: http://lists.unbit.it/pipermail/uwsgi/2016-January/008333.html