unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.45k stars 691 forks source link

vassal doesn't get kill when stop emperor #733

Closed favadi closed 10 years ago

favadi commented 10 years ago

I'm running openerp with uwsgi 2.07 in emperor mode.

For testing purpose, I have to stop and uninstall uwsgi many times a day. And sometimes, the openerp-master and worker processes still appear in the process list, openerp-master process's ppid is 1.

The uwsgi build config:

[uwsgi]
xml = false
ini = false
yaml = true
json = false
sqlite3 = false
zeromq = false
snmp = false
spooler = true
embedded = true
ssl = false
udp = true
multicast = false
threading = true
minterpreters = true
matheval = false
async = true
ldap = false
pcre = false
routing = false
alarm = false
ipv6 = false
debug = false
unbit = false
syslog = true
xml_implementation = libxml2
yaml_implementation = auto
malloc_implementation = libc
extras =
plugins =
bin_name = uwsgi
append_version =
plugin_dir = .
embedded_plugins = python,syslog, gevent, ping, nagios, carbon, signal
as_shared_library = false
locking = auto
event = auto
timer = auto
filemonitor = auto
blacklist =
whitelist =
embed_files =
embed_config =

The emperor config:

uwsgi:
  threaded-logger: True
  logger: syslog:uwsgi
  carbon-id: emperor
  carbon: mycompany.com:2014
  emperor: /etc/uwsgi
  no-orphans: True
  die-on-term: True

The openerp config:

uwsgi:
  master: true
  processes: 2
  harakiri: 60
  harakiri-verbose: true
  lazy-apps: true
  uid: openerp
  gid: openerp
  procname: openerp-worker
  procname-master: openerp-master
  socket: /var/lib/uwsgi/openerp.sock
  stats: /var/lib/uwsgi/openerp-stats.sock
  chdir: /usr/local/openerp/openerp-7.0-20130909-231057/openerp
  wsgi-file: /usr/local/openerp/openerp-7.0-20130909-231057/openerp.wsgi
  virtualenv: /usr/local/openerp
  threaded-logger: true
  no-orphans: true
  logger: syslog:uwsgi-openerp
  carbon-id: openerp
  carbon: 107.170.94.182:2003
  chmod-socket: 770
  chown-socket: www-data:www-data

I can't create a test to reliable reproduce the problem. It's kind of happens randomly, sometimes it happens very often and sometimes I simply can't make make it happens.

Does anyone has this problem before? How can I prevent it? Right now I have to run pkill -9 -f [o]penerp-master after uninstall uwsgi to make sure the process die.

unbit commented 10 years ago

It is a bit strange because a vassal destroy itself when it lose connection with the Emperor, and if the Emperor dies, its file descriptors are destroyed. Emperor logs should have relevant informations about what happened. Can you check them ?

favadi commented 10 years ago

@unbit

That is emperor logs:

Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your mercy for graceful operations on workers is 60 seconds
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: mapped 191952 bytes (187 KB) for 2 cores
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Operational MODE: preforking ***
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** uWSGI is running in multiple interpreter mode ***
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [uwsgi-KSM] enabled with frequency: 20
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: gracefully (RE)spawned uWSGI master process (pid: 23082)
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 1 (pid: 24974, cores: 1)
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 2 (pid: 24975, cores: 1)
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Stats server enabled on /var/lib/uwsgi/openerp-stats.sock 
fd: 14 ***
Oct  1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
Oct  1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0
x891e540 pid: 24974 (default app)
Oct  1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: write(): Bad file descriptor [core/uwsgi.c line 3433]
Oct  1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0
x891e540 pid: 24975 (default app)
Oct  1 03:51:45 integration-erp1678-uwsgi-2-0-7-14 uwsgi: message repeated 2 times: [ waiting for Emperor death...]
Oct  1 03:51:46 integration-erp1678-uwsgi-2-0-7-14 uwsgi: The Emperor has been buried (pid: 15437)
Oct  1 03:51:46 integration-erp1678-uwsgi-2-0-7-14 uwsgi: goodbye to uWSGI.

btw, the state of openerp-master is S when it happens.

unbit commented 10 years ago

It looks like this instance has closed (or clobbered) the communication socket with the Emperor. Can you report the first log lines of uwsgi-openerp ? it should print the file descriptor assigned to the Emperor channel

favadi commented 10 years ago

@unbit

2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your mercy for graceful operations on workers is 60 seconds
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [uwsgi-KSM] enabled with frequency: 20
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: machine: i686
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Python threads support is disabled. You can enable it with --enable-threads ***
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: nodename: integration-erp1678-uwsgi-2-0-7-14
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: clock source: unix
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your server socket listen backlog is limited to 100 connections
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers ***
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: current working directory: /etc/uwsgi
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [carbon] added server 107.170.94.182:2003
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: lock engine: pthread robust mutexes
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 1 (pid: 24974, cores: 1)
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: worker 2 buried after 1 seconds
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: worker 1 buried after 1 seconds
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: found fd 6 mapped to socket 0 (/var/lib/uwsgi/openerp.sock)
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: *** has_emperor mode detected (fd: 9) ***
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: binary reloading uWSGI...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: closing all non-uwsgi socket fds > 2 (max_fd = 1024)...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: running /usr/local/uwsgi-2.0.7/uwsgi
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: [uWSGI] getting YAML configuration from openerp.yml
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: chdir() to /etc/uwsgi
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: Wed Oct 1 03:51:41 2014 - received message 0 from emperor
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: SIGINT/SIGQUIT received...killing workers...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: [emperor] *** RAGNAROK ALREADY EVOKED (mercyless in 30 seconds)***
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: Wed Oct 1 03:51:41 2014 - [emperor] stop the uwsgi instance openerp.yml
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: ...brutally killing workers...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14  integration-erp1678-uwsgi-2-0-7-14 uwsgi: [emperor] *** RAGNAROK EVOKED ***```
unbit commented 10 years ago

The last log shows expected behaviour. Unfortunately i am not able to catch what is wrong (it looks like your vassal basically has no more connection with the Emperor). If you can confirm the error is always the same (write(): Bad file descriptor [core/uwsgi.c line 3433]) We can simply call exit(1) in such condition to ensure consistency.

favadi commented 10 years ago

I confirm that I always get write(): Bad file descriptor [core/uwsgi.c line 3433] in uwsgi log when it happens.

unbit commented 10 years ago

Ok, i'll push a patch in the 2.0 branch to auto-destroy the instance when this error is triggered.I'll ping you as soon it is ready

favadi commented 10 years ago

great, thanks for your help.

unbit commented 10 years ago

ok, pushed, let me know

favadi commented 10 years ago

I test with uwsgi uwsgi 2.0 branch and the problem does not happen anymore! Thank you for your fix.