unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.45k stars 691 forks source link

uwsgi going into respawn loop #574

Closed j0nes2k closed 10 years ago

j0nes2k commented 10 years ago

I have uwsgi v2.0 running on Ubuntu 13.04. uwsgi runs in Emperor mode with two apps deployed. The apps share a large amount of codebase, one is an API app, the other one is a Flask- and Jinja2 based webapp. The API app has been running fine for the past few months without any problems.

Now we are in the process of deploying the webapp. This app behaves strange - in the logs I can periodically see vassals complaining about disconnecting masters, going into some kind of "respawn loop". The app itself does not report any errors and after its restart (which takes 4-5 seconds) it is responding fine again. This happens with or without requests hitting the webapp. Without any request, uwsgi respawns about every 2 minutes. With a simple test (hitting the same page in 2 second intervals), the app is running about 10-15 minutes without respawning.

There should be no exception thrown from the app, and in the logfiles and stats daemons everything is looking fine.

What is the cause for this behaviour? Any idea on how to debug this further?

Here is the emperor config:

uwsgi:
  emperor-pidfile: /var/run/uwsgi/uwsgi.pid
  emperor-stats: 127.0.0.1:4444
  master: true
  emperor: /etc/uwsgi/vassals 
  die-on-term: true

...and here is the webapp config:

uwsgi:
  chdir: /home/user/frontend
  master: true
  socket: 10.226.82.202:8004
  callable: app
  module: production
  logto: /var/log/uwsgi/frontend.log
  log-reopen: true
  virtualenv: /home/user/.virtualenvs/frontend
  pythonpath: /var/repositories/python-modules-staging
  processes: 3
  threads: 1
  single-interpreter: true
  stats-server: 127.0.0.1:5000
  heartbeat: 30
  uid: user
  gid: www-data
  pidfile: /var/run/uwsgi/frontend.pid

Excerpt from the logfile /var/log/uwsgi/frontend.log:

 *** Starting uWSGI 2.0 (64bit) on [Tue Mar 18 14:24:16 2014] ***
 compiled with version: 4.7.3 on 27 January 2014 15:52:41
 os: Linux-3.8.0-19-generic #29-Ubuntu SMP Wed Apr 17 18:16:28 UTC 2013
 nodename: webdynamic
 machine: x86_64
 clock source: unix
 detected number of CPU cores: 2
 current working directory: /etc/uwsgi/vassals
 writing pidfile to /var/run/uwsgi/frontend.pid
 detected binary path: /usr/local/bin/uwsgi
 !!! no internal routing support, rebuild with pcre support !!!
 setgid() to 33
 set additional group 111 (admin)
 setuid() to 1001
 your processes number limit is 13389
 your memory page size is 4096 bytes
 detected max file descriptor number: 1024
 lock engine: pthread robust mutexes
 thunder lock: disabled (you can enable it with --thunder-lock)
 uwsgi socket 0 bound to TCP address 10.226.82.202:8004 fd 3
 Python version: 2.7.4 (default, Sep 26 2013, 03:23:21)  [GCC 4.7.3]
 Set PythonHome to /home/user/.virtualenvs/frontend
 Python main interpreter initialized at 0xd251d0
 python threads support enabled
 your server socket listen backlog is limited to 100 connections
 your mercy for graceful operations on workers is 60 seconds
 mapped 291008 bytes (284 KB) for 3 cores
 *** Operational MODE: preforking ***
 added /var/repositories/python-modules-staging/ to pythonpath.
 WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0xd251d0 pid: 19570 (default app)
 spawned uWSGI master process (pid: 19570)
 spawned uWSGI worker 1 (pid: 19575, cores: 1)
 spawned uWSGI worker 2 (pid: 19576, cores: 1)
 spawned uWSGI worker 3 (pid: 19577, cores: 1)
 *** Stats server enabled on 127.0.0.1:5000 fd: 17 ***
 Tue Mar 18 14:26:26 2014 - uWSGI worker 1 screams: UAAAAAAH my master disconnected: i will kill myself !!!
 Tue Mar 18 14:26:26 2014 - uWSGI worker 3 screams: UAAAAAAH my master disconnected: i will kill myself !!!
 Tue Mar 18 14:26:26 2014 - uWSGI worker 2 screams: UAAAAAAH my master disconnected: i will kill myself !!!
 *** Starting uWSGI 2.0 (64bit) on [Tue Mar 18 14:26:26 2014] ***
 compiled with version: 4.7.3 on 27 January 2014 15:52:41
 os: Linux-3.8.0-19-generic #29-Ubuntu SMP Wed Apr 17 18:16:28 UTC 2013
 nodename: webdynamic
 machine: x86_64
 clock source: unix
 detected number of CPU cores: 2
 current working directory: /etc/uwsgi/vassals
 writing pidfile to /var/run/uwsgi/frontend.pid
 detected binary path: /usr/local/bin/uwsgi
 !!! no internal routing support, rebuild with pcre support !!!
 setgid() to 33
 set additional group 111 (admin)
 setuid() to 1001
 your processes number limit is 13389
 your memory page size is 4096 bytes
 detected max file descriptor number: 1024
 lock engine: pthread robust mutexes
 thunder lock: disabled (you can enable it with --thunder-lock)
 probably another instance of uWSGI is running on the same address (10.226.82.202:8004).
 bind(): Address already in use [core/socket.c line 779]
 *** Starting uWSGI 2.0 (64bit) on [Tue Mar 18 14:26:31 2014] ***
 compiled with version: 4.7.3 on 27 January 2014 15:52:41
 os: Linux-3.8.0-19-generic #29-Ubuntu SMP Wed Apr 17 18:16:28 UTC 2013
 nodename: webdynamic
 machine: x86_64
 clock source: unix
 detected number of CPU cores: 2
 current working directory: /etc/uwsgi/vassals
 writing pidfile to /var/run/uwsgi/frontend.pid
 detected binary path: /usr/local/bin/uwsgi
 !!! no internal routing support, rebuild with pcre support !!!
 setgid() to 33
 set additional group 111 (admin)
 setuid() to 1001
 your processes number limit is 13389
 your memory page size is 4096 bytes
 detected max file descriptor number: 1024
 lock engine: pthread robust mutexes
 thunder lock: disabled (you can enable it with --thunder-lock)
 uwsgi socket 0 bound to TCP address 10.226.82.202:8004 fd 3
 Python version: 2.7.4 (default, Sep 26 2013, 03:23:21)  [GCC 4.7.3]
 Set PythonHome to /home/user/.virtualenvs/frontend
 Python main interpreter initialized at 0x12a71d0
 python threads support enabled
 your server socket listen backlog is limited to 100 connections
 your mercy for graceful operations on workers is 60 seconds
 mapped 291008 bytes (284 KB) for 3 cores
 *** Operational MODE: preforking ***
 added /var/repositories/python-modules-staging/ to pythonpath.
 WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x12a71d0 pid: 20209 (default app)
 spawned uWSGI master process (pid: 20209)
 spawned uWSGI worker 1 (pid: 20214, cores: 1)
 spawned uWSGI worker 2 (pid: 20215, cores: 1)
 spawned uWSGI worker 3 (pid: 20216, cores: 1)
 *** Stats server enabled on 127.0.0.1:5000 fd: 17 ***     
unbit commented 10 years ago

Check your Emperor logs, if they reports the vassal is not responding to heartbeats, you have hit a "timing bug" with the heartbeat subsystem. Just upgrade to 2.0.3 and the issue should disappear. If you cannot upgrade, remove heartbeat from the vassal

j0nes2k commented 10 years ago

Upgrading to uWSGI 2.0.3 seems to have done the trick - at least in the first test. I will report back tomorrow and close this issue if everything is still working fine.

j0nes2k commented 10 years ago

I can confirm that this was the error. Thank you for your help, closing this issue.