unbit / uwsgi

uWSGI application server container
http://projects.unbit.it/uwsgi
Other
3.45k stars 691 forks source link

Fork, subprocess.call and popen* do not return in dynamic prethreaded mode #1034

Closed remyroy closed 8 years ago

remyroy commented 9 years ago

Fork (os.fork), subprocess.call and the various popen* functions do not return in dynamic and prethreaded mode. They all block and uwsgi ends up in an unresponsive state. This was tested with uwsgi 2.0.11.1, Python 2.7.9 and Ubuntu 15.04. It was also tested with uwsgi 2.0.5.1, Python 2.7.9 and CentOS 6.7. Both systems had the same issue.

How to reproduce

To reproduce, start nginx with a similar config file:

worker_processes  1;

events {
    worker_connections  1024;
}

http {

    access_log  /tmp/nginx_access.log  combined;
    error_log   /tmp/nginx_error.log   crit;

    server {
        listen  9090;

        root /tmp;

        location / {
            uwsgi_pass unix:///tmp/uwsgi.sock;
            uwsgi_param UWSGI_MODULE app;
            uwsgi_param UWSGI_CALLABLE application;
        }

    }
}

Start uwsgi with these parameters:

$ ./uwsgi -s /tmp/uwsgi.sock --threads 2

Have a file called 'app.py' in the uwsgi directory with the following content:

import subprocess, os

print('Test os.fork')
os.fork()
print('os.fork worked')

print('Test subprocess.call')
subprocess.call('echo test')
print('subprocess.call worked')

print('Test os.popen')
f = os.popen('echo test')
print('os.popen worked')

def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return [b"Hello World"]

Open the url http://localhost:9090/

The browser should be trying to load the page for a while and nginx will eventually display a 504 Gateway Time-out error.

In uwsgi logs, you should see that one of the function is called with the printed message but nothing happens after that. You can test the various functions by commenting the ones you do not want to test.

Expectations

The various subprocess calls and popen functions should work properly and return without endlessly blocking when uwsgi is configured to run in dynamic mode and prethreaded mode.

xrmx commented 9 years ago

I think you should think about leveraging the spooler or the mules for asynchronous tasks, see http://uwsgi-docs.readthedocs.org/en/latest/PythonDecorators.html as a starting point.

remyroy commented 9 years ago

I wish it would be that simple but I would need to rewrite large part of the standard library which uses popen. I found this problem because I'm using Flask which uses the standard library uuid which depends on ctypes.util.find_library which uses os.popen.

unbit commented 9 years ago

I think you should get:

RuntimeError: not holding the import lock

in your logs if you try to do what you described.

Calling fork() in such a way will destroy everything, i am not sure to understand why you are doing it instead of simply calling subprocess/popen

remyroy commented 9 years ago

I am not calling Fork in my code but subprocess/popen is and I think it might be the reason why subprocess/popen does not work. This is just a simple way to replicate this issue. Comment out the fork call to test the other functions.

unbit commented 9 years ago

They call fork()+exec() that is a different pattern and works ok for me. If you remove that os.fork() all should work normally. Eventually report the full uWSGI logs

remyroy commented 9 years ago

Here is the full uWSGI logs when the os.fork call is commented out:

*** Starting uWSGI 2.0.11.1 (64bit) on [Wed Sep  9 17:42:43 2015] ***
compiled with version: 4.9.2 on 08 September 2015 16:20:38
os: Linux-3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015
nodename: ubuntu
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/hidden/Projects/uwsgi
detected binary path: /home/hidden/Projects/uwsgi/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 15690
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /tmp/uwsgi.sock fd 3
Python version: 2.7.9 (default, Apr  2 2015, 15:37:21)  [GCC 4.9.2]
Python main interpreter initialized at 0x225f8e0
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 83072 bytes (81 KB) for 2 cores
*** Operational MODE: threaded ***
*** no app loaded. going in full dynamic mode ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 61212, cores: 2)
Test subprocess.call

Here is the content of the new 'app.py' file so there is no ambiguity:

import subprocess, os

'''
print('Test os.fork')
os.fork()
print('os.fork worked')
'''

print('Test subprocess.call')
subprocess.call('echo test')
print('subprocess.call worked')

print('Test os.popen')
f = os.popen('echo test')
print('os.popen worked')

def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return [b"Hello World"]

With such a WSGI application, I expect to see a "subprocess.call worked" message in the logs which is not the case here.

unbit commented 9 years ago

Can't reproduce it, sorry, use strace, your pattern is so common (and even used internally in uWSGI via --hook options) that i suspect something wrong in your binary or some security enforcement that is blocking the process.

unbit commented 9 years ago

Btw, have you tried without dynamic mode ? Hosting multiple interpreters in the same process has lot of corner case problems, maybe you hit one of those.

remyroy commented 9 years ago

I will check out with strace. I built uWSGI from the source on http://projects.unbit.it/downloads/uwsgi-2.0.11.1.tar.gz and http://projects.unbit.it/downloads/uwsgi-2.0.5.1.tar.gz on 2 different systems and both had the same issue. I will try again with a clean Ubuntu install.

It works fine without dynamic mode. My goal is not to run multiple interpreters in the same process but to delay the loading of the application until a request is made.

unbit commented 9 years ago

There are saner way of doing it using upstart/systemd/emperor and on-demand socket. Btw i think i could have found your issue, i just only to get confirmation that your process is stuck on futex() syscall

unbit commented 9 years ago

I wrote a little hack to make pthread_atfork() smarter, i have no idea of the implications on non-Linux systems, but dynamics app are pretty edgy so i think we can survive :)

remyroy commented 9 years ago

Thanks for your time and for giving some more thought to it. I guess you were able to reproduce the issue and you found the source of it. If you need more details, logs or information, let me know.

I just tested your little hack in 33df0dbcc95acc2fdef56aba8da2a5fc653e6571 and it solved this issue on my end. subprocess.call and os.popen return and they work as intended in dynamic and prethreaded mode. Here is the log that I get now:

*** Starting uWSGI 2.1-dev-281c1f5 (64bit) on [Thu Sep 10 12:06:39 2015] ***
compiled with version: 4.9.2 on 10 September 2015 12:04:07
os: Linux-3.19.0-25-generic #26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015
nodename: ubuntu
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/remyroy/Projects/uwsgi
detected binary path: /home/remyroy/Projects/uwsgi/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 15690
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /tmp/uwsgi.sock fd 3
Python version: 2.7.9 (default, Apr  2 2015, 15:37:21)  [GCC 4.9.2]
Python main interpreter initialized at 0x1ac8720
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
your request buffer size is 4096 bytes
mapped 83120 bytes (81 KB) for 2 cores
*** Operational MODE: threaded ***
*** no app loaded. going in full dynamic mode ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 62283, cores: 2)
Test subprocess.call

subprocess.call worked
Test os.popen
os.popen worked
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x1ac8720 pid: 62283 (default app)
[pid: 62283|app: 0|req: 1/1]  () {18 vars in 383 bytes} [Thu Sep 10 12:06:43 2015]   => generated 11 bytes in 9 msecs ( 200) 1 headers in 44 bytes (1 switches on core 0)

Here is the 'app.py' file I used for this run:

import subprocess, os

'''
print('Test os.fork')
os.fork()
print('os.fork worked')
'''

print('Test subprocess.call')
subprocess.call('/bin/echo')
print('subprocess.call worked')

print('Test os.popen')
f = os.popen('/bin/echo')
print('os.popen worked')

def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return [b"Hello World"]

It seems this issue is solved now. You can close it if you want.

If you have any tip for a saner solution to my original need of having my applications load on the first request, I am open to suggestions. My production server is on a shared hosting without root access so it limits my options. I have never used on-demand sockets. If you know any recipe, guide or documentation about them, I would appreciate the references.

Using dynamic mode with uWSGI is just a matter of changing 3 lines of configuration for my applications so it is pretty easy to achieve the goal of having them load on the first request. Is using dynamic mode in production really insane for this purpose?