Gracefully exiting a Python process running Bjoern

saley89 commented 8 years ago

Hi,

I have a simple Python script wrapping a Bjoern web server around a WSGI app. In essence, it looks like this:

def signal_term_handler(signal, frame):
    logging.info('got SIGTERM')
    sys.exit(0)

def main():
    signal.signal(signal.SIGTERM, signal_term_handler)
    logging.info(" Starting Bjoern with WSGI app at http://{}:{}/".format(HOST, PORT))
    bjoern.run(router, HOST, PORT)

The server works just fine. The problem is that when sending a SIGTERM signal (e.g. using kill -15), the signal handler doesn't get called and the process doesn't die. When sending SIGKILL, the process does die, without calling the signal handler (this is expected).

How can I intercept SIGTERM?

jonashaag commented 8 years ago

Hm... the way Python implements signals makes it impossible to use them in this situation (C event-loop based server), because they are put into in an internal signal queue and the handlers are only called the next time the Python interpreter is executed, which is when a new request is being made to the server or an existing request is being processed.

I'm not sure how other event-loop based servers solve this problem, maybe by periodically invoking the Python interpreter.

Maybe you can do aside with signals altogether?

jonashaag commented 8 years ago

This seems to be possible to work around with periodically calling into https://docs.python.org/2/c-api/exceptions.html#c.PyErr_CheckSignals

saley89 commented 8 years ago

Thanks for the quick reply and confirming our suspicions that it may be related to the C side of the library. We will look into an alternative solution.

It would be great if you could build in the CheckSignals code you mentioned above if that would allow us to handle signals correctly. I can leave this open if you would like to look into it, otherwise I am happy to close the issue.

vmandke commented 7 years ago

Shouldn't SIGTERM be handled the same way SIGINT is ? For graceful shutdown ?

jonashaag commented 7 years ago

That's correct, if it doesn't work like that then that's a separate issue.

vmandke commented 7 years ago

@jonashaag SIGTERM is not handled like SIGINT as https://github.com/jonashaag/bjoern/blob/master/bjoern/server.c#L76 only handles SIGINT.

A follow-up question:

https://github.com/jonashaag/bjoern/blob/master/bjoern/server.c#L91 places a SIGINT on the signal queue, would

PyErr_SetInterrupt(); PyErr_CheckSignals(); ensure that the signal is propagated as soon as it occurs ?

jonashaag commented 7 years ago

I don't know, maybe the other way round (first check signals, then set interrupt), but in either case we'd have to make sure we hold the GIL

Auha commented 7 years ago

Are there any work arounds? I have an app that should not have any downtime.

ninjatrench commented 6 years ago

Maybe you can run this as a subprocess and kill it whenever required, I guess

bulletmark commented 5 years ago

To gracefully shut down services, systemd sends a SIGTERM followed 90 secs later by a SIGKILL if necessary (see here). Due to this issue there is no easy way to capture SIGTERM when using bjoern. I find that meinheld just returns from the main event loop on any interrupting signal so I don't need a signal handler at all. Perhaps that approach could be considered?

So I have will have to move back to meinheld at least temporarily although my benchmarks show that bjoern performs slightly faster.

jonashaag commented 5 years ago

Can you guys try the https://github.com/jonashaag/bjoern/tree/signal-watcher branch

bulletmark commented 5 years ago

Sorry, but I really don't want to create a signal handler (it is asynchronous and there are plenty of spin-off issues, e.g. when using multiprocess the children inherit the handler) and was trying to use one simply to catch systemd terminating my service. I suspect this is probably also the case for the OP @saley89 here.

Now that I have seen that meinheld simply returns from the run() event loop for the SIGTERM case it makes a lot of sense to me so I will stick with that. As I said, perhaps you could consider that approach also for bjoern for better systemd compatibility.

jonashaag commented 5 years ago

@bulletmark The patch in that branch should make this work. It simply checks every 100ms if there's a new signal in the Python signal queue and executes that Python signal handler.

thefab commented 5 years ago

The https://github.com/jonashaag/bjoern/tree/signal-watcher branch works perfectly for me with SIGTERM.

My use case is a on_sigterm python handler which do some "cleaning things" then send a SIGINT to the same process to force bjoern to exit.

bulletmark commented 5 years ago

I still think that simply returning from the event loop as I describe in my 2 posts above (and how meinheld does it) is a better way to go, and most compatible with systemd termination. @jonashaag, please reconsider that approach.

jonashaag commented 5 years ago

I just released 3.0.0 with the approach taken in the signal-watcher branch.

@bulletmark could you elaborate on what's better in the meinheld implementation for your use case?

thefab commented 5 years ago

@bulletmark @jonashaag a complete solution could be to add to the bjoern package a kind of bjoern_wrapper which could be used as a "bjoern cli" (launched by systemd, supervisor, circus or other process managers). Included features would be:

gracefull and automatic exit in case of SIGTERM
timeout management
CLI (argparse) to set host, port, unix_socket and (of course) wsgi application path (with import thanks to importlib)

I use something like that internally. With a little bit of cleaning, I could propose this as a PR if your are interested.

jonashaag commented 5 years ago

I use something like that internally. With a little bit of cleaning, I could propose this as a PR if your are interested.

Definitely! Could also paste it here uncleaned so we can discuss without having you spend time on it

thefab commented 5 years ago

internal script hardly cleaned (but not completely), not even tested, just here to discuss

i think we need to add :

argparse options for "port" (I use only unix socket)
some "hook" features to be able to provide python callables (thanks to importlib) for some events (on_sigterm, before_start...) like in gunicorn

#!/usr/bin/env python3

import argparse
import sys
import bjoern
import importlib
import signal
import datetime
import time
import os
import threading
from mflog import get_logger
from mfutil import kill_process_and_children

LOGGER = get_logger("bjoern_wrapper")

def send_sigint(pid):
    LOGGER.info("sending SIGINT to %i", pid)
    os.kill(pid, signal.SIGINT)

def on_signal(signum, frame):
    if signum == 15:
        LOGGER.info("received SIGTERM => smart closing...")
        # [...]
        # Real stop
        send_sigint(os.getpid())

def get_wsgi_application(path):
    if len(path.split(':')) != 2:
        LOGGER.warning("main_arg must follow module.submodule:func_name")
        sys.exit(1)
    module_path, func_name = path.split(':')
    mod = importlib.import_module(module_path)
    try:
        return getattr(mod, func_name)
    except Exception:
        LOGGER.warning("can't find: %s func_name in module: %s" % (
            func_name, mod))
        sys.exit(1)

class TimeoutWsgiMiddleware(object):

    def __init__(self, app, timeout):
        self.app = app
        self.timeout = timeout
        self.started = None
        if self.timeout > 0:
            x = threading.Thread(target=self.timeout_handler, daemon=True)
            x.start()

    def timeout_handler(self):
        now = datetime.datetime.now
        while True:
            if self.started:
                if (now() - self.started).total_seconds() > self.timeout:
                    LOGGER.warning("Request Timeout => exit")
                    # Self-Kill
                    kill_process_and_children(os.getpid())
            time.sleep(1)

    def __call__(self, environ, start_response):
        if self.timeout > 0:
            self.started = datetime.datetime.now()
        try:
            # see http://blog.dscpl.com.au/2012/10/
            #     obligations-for-calling-close-on.html
            iterable = self.app(environ, start_response)
            for data in iterable:
                yield data
        finally:
            self.started = None
            if hasattr(iterable, 'close'):
                iterable.close()

def main():
    parser = argparse.ArgumentParser(description="bjoern wrapper")
    parser.add_argument("main_arg", help="wsgi application path")
    parser.add_argument("unix_socket", help="unix socket to listen path")
    parser.add_argument("--timeout", default=60, type=int,
                        help="one request execution timeout (in seconds)")
    args = parser.parse_args()
    signal.signal(signal.SIGTERM, on_signal)
    wsgi_app = get_wsgi_application(args.main_arg)
    try:
        os.unlink(args.unix_socket)
    except Exception:
        pass
    try:
        app = TimeoutWsgiMiddleware(wsgi_app, args.timeout)
        bjoern.run(app, 'unix:%s' % args.unix_socket)
    except KeyboardInterrupt:
        LOGGER.info("process shutdown")

if __name__ == '__main__':
    main()

bulletmark commented 5 years ago

@jonashaag, what I am talking about is so simple it is embarrassing that I apparently have not explained it clearly enough in my first 2 posts here! :)

Normally code blocks essentially forever in run() and all I am suggesting is that run() should simply return if a SIGTERM is received. SIGTERM is the standard way systemd shuts down a program. No need for the user to write any signal handlers (with the associated complications that often entails as I mention above).

Meinheld does it this way and it works well.

thefab commented 5 years ago

@bulletmark I understand what you are saying ;-) But IMHO, today, bjoern is a kind of library and not a "full featured daemon". So, you can also want to get "full control" of the stop process (particularly if you provide a python socket(-like) object by yourself).

So we have two different use cases:

a "gunicorn like" use case (when you use bjoern as a regular daemon with SIGTERM standard features)
a "library" use case (where you want full control on signals)

I propose to add to existing bjoern a wrapper to have both use cases.

With the wrapper, systemd will launch your app with:

bjoern_cli --port=8080 --timeout=60 yourmod.yoursubmod:app

and you will get a regular SIGTERM feature (without anything to implement)

What do you think about this compromise ?

bulletmark commented 5 years ago

I have a python program which just calls bjoern's run(). Actually it does that via bottles run(..., server='bjoern') which is even more trivial as I can just change server='meinheld' etc. After the run() command I call some cleanup stuff which I need to do before termination. How would I do that with your proposal?

thefab commented 5 years ago

My plan:

bjoern_cli --port=8080 --timeout=60 --hook-after-stop=yourmod.after_stop yourmod:app

with "user provided" hook:

def after_stop(wsgi_app, **kwargs)
    # [...] clean what you want
    return True

I plan to implement other hooks:

before_start(wsgi_app, **kwargs) (if return False the cli exit)
on_signal(wsgi_app, signum, frame, **kwargs) (if return False is returned, a SIGINT is sent to bjoern to exit the main loop), with of course a default/regular SIGTERM handling

bulletmark commented 5 years ago

That would work of course but nowhere near as simple or intuitive as just returning from run() when terminating.

jonashaag commented 5 years ago

Stupid question, if I send sigterm/sigint to a Python program, it will raise KeyboardInterrupt, right? So shouldn't that be what happens when you try to interrupt bjoern as well?

thefab commented 5 years ago

With this code:

import time

try:
    time.sleep(60)
except KeyboardInterrupt:
    print("catched KeyboardInterrupt")
except:
    print("catched something else")

with SIGINT, you get: catched KeyboardInterrupt (with return code 0)
with SIGTERM, you get an uncatched Terminated (with return code 143)

jonashaag commented 5 years ago

From #145

So, here's how Gunicorn works:

SIGINT shuts everything down momentarily.
SIGTERM waits for all active connections to finish, with a timeout (30 secs by default). It ignores the backlog though, and there doesn't seem to be a way to change that through configuration.

I agree that we should not throw away the backlog when doing a graceful shutdown.

Also I think it's fine to just stop bjoern without any kind of graceful shutdown for SIGINT (but not for SIGTERM).

Bjoern currently works as follows:

SIGINT waits for active connections to finish, without a timeout, and ignores the backlog.
SIGTERM simply terminates the program immediately.

So, going forward, my suggestion is we change Bjoern behaviour as follows:

SIGINT simply shuts down all connections immediately, and raises a KeyboardInterrupt in the program.
SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program. (?)

We can make the SIGTERM behaviour configurable, and allow for custom shutdown behaviours, by exporting some shutdown primitives from the C code.

jonashaag commented 5 years ago

I'm still not sure how to meet both the expectations of people thinking of bjoern as a "library" that doesn't fiddle with signal handling and people thinking of bjoern as a server that takes care of graceful loop shutdown.

Maybe we actually have to implement something along the lines of what @thefab suggested.

thefab commented 5 years ago

I'm not sure it's a typo or not but you wrote:

"SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program."

IMHO, after a SIGTERM, we don't accept new connections !

For my "library use case", it's perfectly ok to have another signal for custom shutdown. All I need is the python signal handler called from C like in 3.0.

akuzia commented 4 years ago

is it a correct way to gracefully exit?

import bjoern
import sys
import signal
import os

def sigterm_handler(_signo, _stack_frame):
    sys.stdout.write(' [*] SIGTERM: Stop \n')
    os.kill(os.getpid(), signal.SIGINT)

if __name__ == '__main__':
    signal.signal(signal.SIGTERM, sigterm_handler)
    sys.stdout.write(' [*] Starting bjoern on port 5000...! \n')
    try:
        bjoern.run(app, '0.0.0.0', 5000)
    except KeyboardInterrupt:
        sys.stdout.write(' [*] SIGINT: Stop \n')
    finally:
        sys.stdout.write(' [*] Exiting \n')

jf commented 3 years ago

I'm not sure it's a typo or not but you wrote:

"SIGTERM waits for active connections to finish, and the backlog to be empty (while still accepting new connections), with a timeout, and exits the program."

IMHO, after a SIGTERM, we don't accept new connections !

Yup. I would agree. Without the use of an external coordinator (to route new connections elsewhere), the server might potentially never have a chance to gracefully shut down, cos it'll keep accepting new connections.

For my "library use case", it's perfectly ok to have another signal for custom shutdown. All I need is the python signal handler called from C like in 3.0.

This "library vs server" thing is interesting. Would the following work for a simple compromise? This supposes that you have the ability to modify (and why shouldn't you?) whatever mechanism you're using to send the SIGTERM to bjoern:

before sending the SIGTERM, your external system (a script?) can start up another instance of bjoern to continue listening and accepting new connections (this assumes SO_REUSEPORT is available)
on SIGTERM, bjoern will shut the listening socket down so that it no longer accepts new connections
for existing connections, the older instance of bjoern can take as much time as it needs to finish the existing requests without any pressure to shut anything down quickly, because the new listening socket is already up and serving new traffic. We should probably define a timeout before the older instance of bjoern shuts down... but the key thing here is that we don't have to worry about needing to keep this short.

This setup would work for keeping the server up without any interruption in serving connections, and also allow for graceful connection draining. Does this make sense?

ak4nv commented 3 years ago

Here are my two cents... What if in most cases we use a systemd manager for our services it would be graceful to add a systemd-socket into bjoern out-of-the-box. In this case, all socket issues with opening or closing are gone.

Offtopic: I was confused a little bit. If you run bjoern with unix socket it works as wsgi server, otherwise, it works as http. Is it possible to manage via params? I would like to have the ability to run wsgi server which binds to tcp socket.

jonashaag commented 3 years ago

It always works as a WSGI server (which is a special kind of HTTP server)

ak4nv commented 3 years ago

It always works as a WSGI server (which is a special kind of HTTP server)

Hmm....

This is my wsgi.py

import os
from app.core import create_app

app = create_app()

if __name__ == '__main__':
    #app.run(debug=True)
    import bjoern
    port = os.getenv('BIND_PORT', '8000')

    if not port or not port.isdigit():
        raise RuntimeError('Socket for bind not found.')

    bjoern.run(app, host='localhost', port=int(port))

this is a part of nginx config

    location ~ ^/(?<app>\w+)/(?<app_uri>.*) {
        include uwsgi_params;
        uwsgi_param SCRIPT_NAME /$app;
        uwsgi_param PATH_INFO /$app_uri;
        uwsgi_pass localhost:8000;

        #access_log /var/log/nginx/app/$app.log;
        error_log  /var/log/nginx/app/errors.log warn;
    }

And I'm getting a 400 error code. Any ideas why?

// It works if use proxy_pass instead of uwsgi_pass but SCRIPT_NAME in this case is missed.

bulletmark commented 2 years ago

As above, I left bjoern and went to meinheld given this issue but am now back to bjoern for other reasons. My problem here was that bjoern does not return from run() when systemd sends the process a SIGTERM which is the default manner by which systemd terminates an application. However, I have now found this is a trivial issue because all I had to do was define KillSignal=SIGINT in the systemd service file and that works as I want because bjoern run() does return on SIGINT. Just noting this here for the benefit of others using systemd to manage their application, and want a graceful way to catch termination.

ally-ben commented 2 years ago

Hi folks, inspire by @bulletmark, I do something like this. (GracefulKiller from StackOverflow) A few things need to be handled during the graceful shutdown.

Close the socket (stop receving the request)
Finish the request, and close the multithread
Exit with 0

#!/usr/bin/env python
import os
from flask import Flask
import bjoern
import signal
import time
import sys

app = Flask(__name__)

@app.route("/")
def hello_world():
    time.sleep(30)
    return "<p>Hello, World!</p>"

class GracefulKiller:
    kill_now = False
    def __init__(self):
        # not handle sigint, since the bjoern will raise it.
        # signal.signal(signal.SIGINT, self.exit_gracefully)
        signal.signal(signal.SIGTERM, self.exit_gracefully)

    def exit_gracefully(self, *args):
        sock, wsgi_app = bjoern._default_instance
        # Once the sock close, the request still in buffer will be close, and get the error 'Connection reset by peer'
        # The request already in process will be handle completed. 
        sock.close() 
        print('you can close the background jobs here, or the in main KeyboardInterrupt exception')
        os.kill(os.getpid(), signal.SIGINT)

if __name__ == '__main__':
    GracefulKiller()
    try:
        bjoern.run(app, '0.0.0.0', 8080)
    except KeyboardInterrupt:
        print('If you have some background jobs with multithreading, you can handle it here')
        print('start process graceful shutdown')
        time.sleep(3)
        print('Ready to exit')
        sys.exit(0)

slashburygin commented 1 year ago

Greetings. In my env (python 3.8, bjoern 3.1.0) I test script by @ally-ben. Bjoern stops after sending SIGTERM only after starting so far it has not accepted any request. But if I send to app on bjoern one or more any requests, it's not responding to SIGTERM.

jonashaag / bjoern

Gracefully exiting a Python process running Bjoern #91