Broken requests when gracefully reloading in Gevent mode

frankotron commented 11 years ago

I've been experiencing a few problems when UWSGI is run with the Gevent engine and a HUP is sent to the master process for gracefully restarting the worker processes. The way the test have been run have mainly been with apache benchmarking tool, where a HUP signal have been sent during the benchmark, ex. "ab -c 50 -n 20000 'http://127.0.0.1/hello_graceful'". NGINX is proxying requests back to UWSGI.

The NGINX log spews out a bunch of: "recv() failed (104: Connection reset by peer) while reading response header from upstream" and "connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1"

Its very easy to replicate, what follows are a few runtime specifics; just the basics to replicate the issue:

OS: Centos 6.3 (also tried with ArchLinux)

UWSGI (version 1.4.4 64bit): "uwsgi -s 127.0.0.1:16695 --disable-logging --gevent 1000 -p 3 --master -w simplewsgi" A simple removal of the option "--gevent 1000" seems to eliminate the issue. That is however not an option since the platform I'm working with is based on gevent.

simplewsgi.py:

def application(environ, start_response):
    data = "Hello, World!\n"
    start_response("200 OK", [
        ("Content-Type", "text/plain"),
        ("Content-Length", str(len(data)))
    ])
    return iter([data])

NGINX: simple uwsgi_pass back to 127.0.0.1:16695

Attempted mitigations:

Tried various options in UWSGI such as "listen", the gevent option itself with the async core number, timeouts, number of worker processes etc.
Experienced with nginx options such as
- proxy_read_timeout
- proxy_send_timeout
- proxy_connect_timeout
- uwsgi_read_timeout
- uwsgi_send_timeout
- uwsgi_connect_timeout
- client_body_timeout
- (One solution that actually worked was to duplicate my host in the upstream specification, ex:
upstream uwsgi_node { server 127.0.0.1:16695; server 127.0.0.1:16695; server 127.0.0.1:16695; server 127.0.0.1:16695; server 127.0.0.1:16695; ... (rinse and repeat a few times) }

This however not desirable for obvious reasons. Just thought it was interesting.

Haven't really seen any tickets or issues targeting this problem specifically. Anyone got input whether I'm dealing with a known error or what can be done to fix it?

unbit commented 11 years ago

This has been discussed in the mainling-list, but truly graceful reload in gevent mode is basically impossible as there are too much cases, and destroying greenlet is not as easy as destroying a process or a thread. Currently the "graceful approach" is simply stopping each greenlet (and this could happen in the middle of a request if a greenlet is blocked on an event). I leave this issue open as an area to post ideas on how to better manage it.

edevil commented 11 years ago

This is similar to issue #105 .

frankotron commented 11 years ago

I am not too familiar with the worker architecture or the reload mechanism (planning to look through the code this weekend). I am guessing that new workers are reved up during reloading to parse incoming requests and drain the old workers of requests? Can a configurable drain timeout before greenlets of the old workers are destroyed be implemented? Any solution that doesn't involve hardcore instakilling Greenlets would be preferable.

This is basically what I've started to look at now, but on a larger scale with two UWSGI fastrouters+subscription servers and emperors, and then toggle which UWSGI stack that gets new requests from NGINX, to choke out old requests from the active UWSGI stack and then killing it after a certain time period, regardless of any pending greenlets. I am curious if UWSGI could do this internally before heading down this route with the double stacks.

If anyone has any pointers on certain interesting sections in the code, or alternative solutions, I would really appreciate it.

unbit commented 11 years ago

This is the graceful reload function for the gevent plugin:

https://github.com/unbit/uwsgi/blob/master/plugins/gevent/gevent.c#L27

as you can see is pretty simple: it stops the running greenlet (pay attention, stop not kill), then the greenlets waiting on uwsgi sockets, and finally it exit() the process.

This is different from the multithread/multiprocess reload system as greenlet are cooperative so they could not yield the cpu to the main greenlet (the one managing the reload).

During a reload, the master give a "mercy time" (60 seconds by default) to a worker to exit (otherwise the master will kill it).

Each uWSGI-spawned greenlet has an internal value (in_request) marking it as running or not, so we could wait for a greenlet to become NOT in_request and then stop it. But again who can monitor that, if we are in cooperative mode ?

Regarding subscriptions, this is a special case with less problems, as we can unsubscribe before shutting down the instance, so we can simply trigger the reload when the master does not see requests running.

edevil commented 11 years ago

Maybe a uWSGI worker could spawn a thread when asked to gracefully die that would periodically check to see if there are running requests, and terminate the process when they are all done? Or maybe whenever a request Greenlet finishes it could check if it's worker is in pending reload mode, if it is and it is the last running Grennlet the worker can be terminated? If there are no more new requests arriving for a worker all Greenlets are bound to finish, right?

unbit commented 11 years ago

The second approach seems more viable. I will make some test

unbit commented 11 years ago

Ok, i have committed an improved reloading subsystem.

Basically it uses the join() gevent facility (after reporting the number of running greenlets). In addition to this, a new concept, the "time bomb" has been added. This is a simple posix thread waiting for a timeout. If the timeout elapses exit() is called. This is used as a security measure when there is no master or if we want to define the maximum amount of time to wait directly in the code (this could be discussed later). To configure the timebomb timeout you can use the --worker-reload-mercy (but we could decide to have a gevent specific option for it, given the timebomb could be used in some other area)

edevil commented 11 years ago

This seems great! I will test it first thing on monday. Thanks!

frankotron commented 11 years ago

Indeed, this seems extremely intriguing. So far I solved it with two sets of the applicaton connected to a fastrouter and juggled their configuration symlinks where normally one is running, but this I will check out first thing tomorrow. Awesome!

edevil commented 11 years ago

Fyi, the current master does not build on Linux, at least:

cc1: warnings being treated as errors core/cache.c: In function 'uwsgi_cache_find_free_blocks': core/cache.c:76: error: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'uint64_t' core/cache.c:76: error: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'uint64_t'

unbit commented 11 years ago

you can comment those lines, they are only for debug

edevil commented 11 years ago

When I started uWSGI it complained about 'unable to find cache "true"'. I had it configured like the documentation says in "https://github.com/unbit/uwsgi-docs/blob/master/SSLScaling.rst". I changed the config to "ssl-sessions-use-cache: default" and it started.

Now about the graceful reload. I experimented with having one long request being served and as soon as I send a SIGHUP to the uWSGI server the http connection is terminated.

2013-02-11 10:17:45.799232500 ...gracefully killing workers... 2013-02-11 10:17:45.799390500 Gracefully killing worker 1 (pid: 18364)... 2013-02-11 10:17:45.799422500 stopping gevent signals watchers for worker 1 (pid: 18364)... 2013-02-11 10:17:45.799448500 stopping gevent sockets watchers for worker 1 (pid: 18364)... 2013-02-11 10:17:45.799494500 main gevent watchers stopped for worker 1 (pid: 18364)... 2013-02-11 10:17:45.799734500 waiting for 1 running requests...

2013-02-11 10:17:46.802318500 gateway 1 (uWSGI http 1, pid: 18368) annihilated

2013-02-11 10:18:15.832507500 DEBUG 2013-02-11 10:18:15,832 sapodisco.subsys.swift.serve_block:96 18364 44974640 File 1:rtS8UtIFbYShItu9_TkXBJbX7pvGWS_YrHkqY_18e8KxcXZmCXmDI4Eu_6WutIWW2upan5mggDJwOPKCLRq6hg fetched, will serve 2013-02-11 10:18:15.875833500 {address space usage: 815022080 bytes/777MB} {rss usage: 48709632 bytes/46MB} [pid: 18364|app: 0|req: 1/1] 10.134.167.237 () {28 vars in 556 bytes} [Mon Feb 11 10:17:23 2013] GET /Storage/Disco/Files/dropbox/eclipse-jee-juno-macosx-cocoa-x86_80.tar => generated 44498944 bytes in 51922 msecs (HTTP/1.0 200) 6 headers in 545 bytes (4202 switches on core 9999) 2013-02-11 10:18:15.875984500 goodbye to the gevent Hub on worker 1 (pid: 18364) 2013-02-11 10:18:16.802284500 binary reloading uWSGI...

2013-02-11 10:18:16.802412500 chdir() to /servers/supervise/uwsgi-discosite

Somehow the worker still thinks it has a connected client and it streams data for about 30 seconds until it stops.

unbit commented 11 years ago

2013-02-11 10:17:45.799734500 waiting for 1 running requests...

It looks like the new gevent reloading subsystem is doing its job. The problem here is that you are reloading the http frontend too, and corerouters do not have the concept of graceful reloads. You should try splitting both to check if the new system is working good.

edevil commented 11 years ago

Is there some documentation about how to split these two components? I've always run the http process and the worker processes under the same master process.

Which protocol will they use to communicate between them?

edevil commented 11 years ago

I've found it in http://uwsgi-docs.readthedocs.org/en/latest/HTTP.html, the stand-alone mode. I'll try it!

edevil commented 11 years ago

I've been running tests and ran into some issues:

I had a long-running request, sent a HUP to the master of the worker, the request kept being served. However, new requests coming in did not get served until the worker finished the running request and recycled. I was expecting a new worker to be launched immediately to handle new requests.
I stopped the long-running request in the middle (it was downloading data), and the worker did not notice anything. It kept sending the data to a client that was no longer connected to the http fastrouter. The fastrouter printed this as soon as the client disconnected: hr_write(): Broken pipe [plugins/http/http.c line 378]

unbit commented 11 years ago

"""I had a long-running request, sent a HUP to the master of the worker, the request kept being served. However, new requests coming in did not get served until the worker finished the running request and recycled"""

this is the expected behaviour, if the worker continue serving requests it could never be released. Are you sure you do not want non-stopping reloads ? To accomplish that you can use the subscription system or the zerg mode (as you are already using the https router you can easily use the subscriptions system)

"I stopped the long-running request in the middle (it was downloading data), and the worker did not notice anything. It kept sending the data to a client that was no longer connected to the http fastrouter"

this is bad, when the worker calls write() (and it fails) it should get an exception and stop the request. i will investigate on that

Thanks

edevil commented 11 years ago

"""this is the expected behaviour, if the worker continue serving requests it could never be released. Are you sure you do not want non-stopping reloads ?"""

I guess what I was expecting is what I see with nginx. When a HUP is received, immediately new workers are spawned which handle new requests. Meanwhile, the old workers continue serving the requests they were handling at the time. As they finish their requests they die.

"""To accomplish that you can use the subscription system or the zerg mode (as you are already using the https router you can easily use the subscriptions system)"""

I would prefer not to have front-ends dependant on one another. I think what you are suggesting is that each front-end would run a subscription server and also a core router that would subscribe the service on multiple front-ends. This way when a corerouter goes down the server has other subscribed clients that can take the requests?

unbit commented 11 years ago

the nginx reload system is pretty risky on processes using dozens (or hundreds) of megabytes of memory. You can have something similar using the "chain-reload" (paired with --lazy-apps). Chain reload can be triggered only via touches (sorry no more available UNIX signals :P) --touch-chain-reload. It will reload only workers, the first one being freed will start accepting requests again. Do not rely too much on it (use only for testing), as this option will became --touch-workers-reload while the chain one will be a bit different (the following worker is not restarted til the previous one has been reloaded)

unbit commented 11 years ago

Regarding subscription it is a feature embedded in the https router you do not need another daemon. Basically before triggering the instance reload, you spawn another one that will subscribe to the same https router. I do not know if this is the best solution for you or if the upcoming chain-reload system (not the one you currently find in 1.9) will be more than enough

edevil commented 11 years ago

The upcoming touch-chain-reload seems a good option. I'll keep an eye on the commits to test it when it appears. Thanks!

unbit commented 11 years ago

--touch-chain-reload (the true one) has been committed. Warning: a new master has been committed too, it got a bunch of tests but there could be corner-case bug. Feel free to report them.

edevil commented 11 years ago

I've ran some tests and it seems the feature is working quite well. I just noticed some issues:

Whenever a touch reload happens, and no requests are being served, it seems to say that there are 2 requests. But then it goes ahead and kills the workers:

2013-02-13 16:05:46.666555500 Wed Feb 13 16:05:46 2013 - *** /servers/conf/discoapi-chain-reload has been touched... chain reload !!! ***
2013-02-13 16:05:46.667270500 Gracefully killing worker 1 (pid: 637)...
2013-02-13 16:05:46.667271500 stopping gevent signals watchers for worker 1 (pid: 637)...
2013-02-13 16:05:46.667273500 stopping gevent sockets watchers for worker 1 (pid: 637)...
2013-02-13 16:05:46.667274500 main gevent watchers stopped for worker 1 (pid: 637)...
2013-02-13 16:05:46.667275500 waiting for 2 running requests...
2013-02-13 16:05:46.667276500 goodbye to the gevent Hub on worker 1 (pid: 637)
2013-02-13 16:05:47.666604500 worker 1 killed successfully (pid: 637)
2013-02-13 16:05:47.667556500 Respawned uWSGI worker 1 (new pid: 1019)
2013-02-13 16:05:47.667726500 Gracefully killing worker 2 (pid: 647)...
2013-02-13 16:05:47.667727500 stopping gevent signals watchers for worker 2 (pid: 647)...
2013-02-13 16:05:47.667728500 stopping gevent sockets watchers for worker 2 (pid: 647)...
2013-02-13 16:05:47.667730500 main gevent watchers stopped for worker 2 (pid: 647)...
2013-02-13 16:05:47.667869500 waiting for 2 running requests...
2013-02-13 16:05:47.667870500 goodbye to the gevent Hub on worker 2 (pid: 647)

I then experimented with the mercy time, I was using reload-mercy which did not produce any results. After switching to worker-reload-mercy it worked. However, the message was duplicated and it would be useful to know how many requests were dropped and maybe the URLs requested:

2013-02-13 15:20:03.652074500 worker 2 (pid: 32038) is taking too much time to die...NO MERCY !!!
2013-02-13 15:20:04.652654500 worker 2 (pid: 32038) is taking too much time to die...NO MERCY !!!
2013-02-13 15:20:04.652656500 worker 2 killed successfully (pid: 32038)
2013-02-13 15:20:04.652657500 Respawned uWSGI worker 2 (new pid: 32074)
2013-02-13 15:20:04.652658500 Wed Feb 13 15:20:04 2013 - chain reloading complete

The message "waiting for 2 running requests..." should say which worker emitted it.

Great work! Thanks.

unbit commented 11 years ago

can you retry with latest code ? should be much better. --reload-mercy has still not be ported to the new master, but it is the whole sum of time a reload can take, so it is not of interest for you.

edevil commented 11 years ago

The messages are much better. Just one thing, the message "waiting for 1 running requests..." still does not say which worker it belongs to, like all the others. When there is a lot going on, it may be difficult to know which worker it is.

unbit commented 11 years ago

done

prymitive commented 11 years ago

Does this needs more work? It looks resolved

damz commented 11 years ago

I'm seeing a weird behavior with graceful reload and gevent when background greenlets are running:

[uwsgi]
socket = 127.0.0.1:1111
master = 1
workers = 2
pythonpath = %c
module = test
py-auto-reload = 1
gevent = 100

And in test/__init__.py:

import os
import gevent
import gevent.socket
import signal

def bg_task():
    while True:
        print "I'm alive!"
        gevent.sleep(10)

gevent.spawn(bg_task)

if os.environ.get("REGISTER_SIGNAL"):
    def _dummy():
        # This is never called.
        print "Signal handler called!"
    gevent.signal(signal.SIGHUP, _dummy)

def application(e, sr):
    sr('200 OK', [('Content-Type','text/html')])
    yield "Hello world!"

Here is what I see:

When run with REGISTER_SIGNAL=1 uwsgi test.ini, a graceful restart results in:

Gracefully killing worker 1 (pid: 58824)... Gracefully killing worker 2 (pid: 58825)... worker 1 buried after 1 seconds worker 2 buried after 1 seconds
When run with just uwsgi test.ini, a graceful restart results in:

Gracefully killing worker 1 (pid: 58847)... Tue Apr 23 15:54:30 2013 - stopping gevent signals watchers for worker 1 (pid: 58847)... Tue Apr 23 15:54:30 2013 - stopping gevent sockets watchers for worker 1 (pid: 58847)... Tue Apr 23 15:54:30 2013 - main gevent watchers stopped for worker 1 (pid: 58847)... Gracefully killing worker 2 (pid: 58848)... Tue Apr 23 15:54:30 2013 - stopping gevent signals watchers for worker 2 (pid: 58848)... Tue Apr 23 15:54:30 2013 - stopping gevent sockets watchers for worker 2 (pid: 58848)... Tue Apr 23 15:54:30 2013 - main gevent watchers stopped for worker 2 (pid: 58848)... [... hang ...] Tue Apr 23 15:55:31 2013 - worker 1 (pid: 58847) is taking too much time to die...NO MERCY !!! Tue Apr 23 15:55:31 2013 - worker 2 (pid: 58848) is taking too much time to die...NO MERCY !!!
In no cases, the signal handler is actually called.

This bares the question: what is the expected behavior? Should the background greenlets be killed or not? Why does registering a dummy signal handler changes this?

In addition, if the expected behavior is not to kill background greenlets, is it possible to catch a shutdown signal to clean them properly?

unbit commented 11 years ago

Signal handlers are re-registered after each fork() to avoid apps destroying the uWSGI internal systems. A hack used by someone is register the signal at the first request (to ensure uWSGI cannot anymore patch the handlers). This is bad as you will lose lot of internal procedures made by uWSGI. The current "blessed" way to run functions at shutdown/reload is using some atexit() handler, like the atexit module in python. In the future, a more gevent friendly approach would be having a file descriptor for each worker that a greenlet can wait for

damz commented 11 years ago

So by registering a signal handler, I'm killing the internal mechanisms? That makes sense.

But long-running greenlets are still a problem. There is no way to gracefully kill them currently. I tested atexit, and in my experience it is not called at all under uwsgi+gevent. But even if it was, it would probably be called after the gevent loop has terminated, right?

unbit commented 11 years ago

Yes, uWSGI uses unix signal internally for lot of things. Yes atexit will run after the hub termination, so it cannot help you. By the way i have just committed a patch to allows the gevent plugin to honour the uwsgi.atexit hook. Just map it to a function and it will be called on reload. Let me know if it works for you

damz commented 11 years ago

@unbit Commit 28a283b1c1cbc613947c57105bba0cbea7173736 seems to invoke uwsgi.atexit too late. I need something triggered from py_uwsgi_gevent_graceful() to be able to gracefully terminate my greenlets (which in turn is going to terminate the hub and let gevent_loop() continue.

unbit commented 11 years ago

does calling it before starting destroying request greenlets is good ?

edevil commented 11 years ago

I'm revisiting this again, and now that I'm using the Kazoo Zookeeper client, the workers don't die until the reload-mercy time is reached. I'm assuming that this is because Kazoo starts it's own Greenlet? The worker knows that it's not managing any request, because it does not print it, but it is waiting for some request to finish.

unbit commented 11 years ago

a bunch of days ago i moved uwsgi.atexit just before the start of greenlets destroy. Maybe disconneting from zookeper there could solve the issue (did not tried myself)

edevil commented 11 years ago

But that seems to run before all requests are finished. Am I right? I would not be able to stop the ZK client while there are requests being served since they might need it.

unbit commented 11 years ago

Honestly i do not think there will be a solution good for everyone. Let's add --gevent-atexit to configure when to call atexit. In your case it would be better to call it before closing sockets, while for damz it is better to trigger it before destroying request greenlets.

edevil commented 11 years ago

I re-read @damz's comments and could not find anything implying that he would want the atexit() to be called before all requests being served are finished. He only mentions that he would prefer atexit() to be called before the Hub is destroyed.

@damz, care to comment?

damz commented 11 years ago

Yes, ideally we would call the atexit right after the requests have completed.

unbit commented 11 years ago

I close this, as we definitely have control now over the shutdown/reload procedure when in gevent mode

unbit / uwsgi

Broken requests when gracefully reloading in Gevent mode #140

2013-02-11 10:17:46.802318500 gateway 1 (uWSGI http 1, pid: 18368) annihilated

2013-02-11 10:18:16.802412500 chdir() to /servers/supervise/uwsgi-discosite