Closed favadi closed 10 years ago
It is a bit strange because a vassal destroy itself when it lose connection with the Emperor, and if the Emperor dies, its file descriptors are destroyed. Emperor logs should have relevant informations about what happened. Can you check them ?
@unbit
That is emperor logs:
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your mercy for graceful operations on workers is 60 seconds
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: mapped 191952 bytes (187 KB) for 2 cores
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Operational MODE: preforking ***
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** uWSGI is running in multiple interpreter mode ***
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [uwsgi-KSM] enabled with frequency: 20
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: gracefully (RE)spawned uWSGI master process (pid: 23082)
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 1 (pid: 24974, cores: 1)
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 2 (pid: 24975, cores: 1)
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Stats server enabled on /var/lib/uwsgi/openerp-stats.sock
fd: 14 ***
Oct 1 03:51:43 integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
Oct 1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0
x891e540 pid: 24974 (default app)
Oct 1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: write(): Bad file descriptor [core/uwsgi.c line 3433]
Oct 1 03:51:44 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0
x891e540 pid: 24975 (default app)
Oct 1 03:51:45 integration-erp1678-uwsgi-2-0-7-14 uwsgi: message repeated 2 times: [ waiting for Emperor death...]
Oct 1 03:51:46 integration-erp1678-uwsgi-2-0-7-14 uwsgi: The Emperor has been buried (pid: 15437)
Oct 1 03:51:46 integration-erp1678-uwsgi-2-0-7-14 uwsgi: goodbye to uWSGI.
btw, the state of openerp-master is S when it happens.
It looks like this instance has closed (or clobbered) the communication socket with the Emperor. Can you report the first log lines of uwsgi-openerp ? it should print the file descriptor assigned to the Emperor channel
@unbit
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your mercy for graceful operations on workers is 60 seconds
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [uwsgi-KSM] enabled with frequency: 20
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: machine: i686
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** Python threads support is disabled. You can enable it with --enable-threads ***
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: nodename: integration-erp1678-uwsgi-2-0-7-14
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: clock source: unix
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: your server socket listen backlog is limited to 100 connections
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: *** WARNING: you have enabled harakiri without post buffering. Slow upload could be rejected on post-unbuffered webservers ***
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: current working directory: /etc/uwsgi
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: [carbon] added server 107.170.94.182:2003
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: lock engine: pthread robust mutexes
2014-10-01 10:51:43.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: spawned uWSGI worker 1 (pid: 24974, cores: 1)
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: worker 2 buried after 1 seconds
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: worker 1 buried after 1 seconds
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: found fd 6 mapped to socket 0 (/var/lib/uwsgi/openerp.sock)
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: *** has_emperor mode detected (fd: 9) ***
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: binary reloading uWSGI...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: closing all non-uwsgi socket fds > 2 (max_fd = 1024)...
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: running /usr/local/uwsgi-2.0.7/uwsgi
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: [uWSGI] getting YAML configuration from openerp.yml
2014-10-01 10:51:42.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: chdir() to /etc/uwsgi
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: Wed Oct 1 03:51:41 2014 - received message 0 from emperor
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: SIGINT/SIGQUIT received...killing workers...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: [emperor] *** RAGNAROK ALREADY EVOKED (mercyless in 30 seconds)***
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: Wed Oct 1 03:51:41 2014 - [emperor] stop the uwsgi instance openerp.yml
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: waiting for Emperor death...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi-openerp: ...brutally killing workers...
2014-10-01 10:51:41.000 integration-erp1678-uwsgi-2-0-7-14 integration-erp1678-uwsgi-2-0-7-14 uwsgi: [emperor] *** RAGNAROK EVOKED ***```
The last log shows expected behaviour. Unfortunately i am not able to catch what is wrong (it looks like your vassal basically has no more connection with the Emperor). If you can confirm the error is always the same (write(): Bad file descriptor [core/uwsgi.c line 3433]) We can simply call exit(1) in such condition to ensure consistency.
I confirm that I always get write(): Bad file descriptor [core/uwsgi.c line 3433]
in uwsgi log when it happens.
Ok, i'll push a patch in the 2.0 branch to auto-destroy the instance when this error is triggered.I'll ping you as soon it is ready
great, thanks for your help.
ok, pushed, let me know
I test with uwsgi uwsgi 2.0 branch and the problem does not happen anymore! Thank you for your fix.
I'm running openerp with uwsgi 2.07 in emperor mode.
For testing purpose, I have to stop and uninstall uwsgi many times a day. And sometimes, the openerp-master and worker processes still appear in the process list, openerp-master process's ppid is 1.
The uwsgi build config:
The emperor config:
The openerp config:
I can't create a test to reliable reproduce the problem. It's kind of happens randomly, sometimes it happens very often and sometimes I simply can't make make it happens.
Does anyone has this problem before? How can I prevent it? Right now I have to run
pkill -9 -f [o]penerp-master
after uninstall uwsgi to make sure the process die.