python-trio / snekomatic

The code behind @trio-bot
Other
21 stars 6 forks source link

Graceful shutdown / SIGTERM #6

Open njsmith opened 5 years ago

njsmith commented 5 years ago

When Heroku decides to restart our app (e.g. for an upgrade, or just automatically once a day), it sends SIGTERM and then waits 30 seconds for it to exit.

Right now, we don't have any SIGTERM handling at all, so

This creates some race conditions: for example, suppose that a PR is merged into this repo by a new contributor. This kicks off two processes:

In practice we kinda get away with it, because it takes heroku a few seconds to notice the new commit and then build a new container image, so snekomatic wins the race. But if snekomatic gets fancier and starts doing more, this will become more of a problem.

It's easy to have a task listen for SIGTERM. But after it gets it, what should it do? I guess we want to do a graceful shutdown of hypercorn (i.e. finish processing current requests, but stop accepting new ones). I don't think hypercorn has any support for this? And Trio doesn't make it particularly easy yet either (cf https://github.com/python-trio/trio/issues/147).

Oh, no, I'm wrong: hypercorn has a config option shutdown_event that it uses for doing graceful shutdown when running in multi-process mode. Maybe we can use this when from API mode too? Is it just as simple as passing shutdown_event=some_trio_event_object as part of our config? CC @pgjones

pgjones commented 5 years ago

When Hypercorn receives a SIGTERM it should trigger a graceful shutdown, whereby it stops accepting connections and allows the remaining ones to complete. It then sends a lifespan.shutdown ASGI event and waits for the application to reply with a lifespan.shutdown.complete event which Quart converts the lifespan messages into shutdown (and startup) handling.

In summary you should be able to do,

@app.after_serving
async def shutdown():
    await stuff_to_finish 

Note as well that Quart-Trio introduces a nursery on the app which I hope allows you to create background tasks, and then await them in the after serving.

@app.route("/job")
async def trigger():
    app.nursery.start_soon(something)
    return 201

@app.after_serving
async def shutdown():
    await app.nursery  # I've forgotten the syntax 
njsmith commented 5 years ago

@pgjones I'm using hypercorn.trio.serve

pgjones commented 5 years ago

Ah, I see.

I'm thinking the hypercorn.trio.serve function needs another argument of the type shutdown: Optional[Awaitable[None]] = None which would allow usage like,

trigger_shutdown = trio.Event()
nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait))
...
trigger_shutdown.set()  # Trigger graceful shutdown of serve

What do you think?

njsmith commented 5 years ago

I think you mean Optional[Callable[[], Awaitable[None]]] or similar, but otherwise sure that would work.

pgjones commented 5 years ago

Sorry there was a typo, it should have been

nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait()))

So that you could pass any awaitable (rather than a callable). Do you think it should be a Callable?

njsmith commented 5 years ago

Trio's normal convention is that we pretend awaitable objects don't exist, only async functions. The major advantages are that it simplifies teaching, and that we can easily use pylint/mypy to catch missing awaits. But that means we'd need to disable those checks to use something like shutdown=trigger_shutdown.wait().

tl;dr: yeah I think a callable would be better

pgjones commented 5 years ago

Sounds sensible, I've added in https://gitlab.com/pgjones/hypercorn/commit/33a6f1a3847382921ec3d37143145d1bf06ee263. I'll release soon, so please say if it doesn't do the job.

njsmith commented 5 years ago

I haven't tried using it, but the diff looks fine :-)

Re: the commit message, just in case one of us is confused: AFAIK there's no problem with regular cancellation on trio, it's just not graceful :-)

pgjones commented 5 years ago

Sorry about the commit message, it isn't very clear (I can't quite remember the reasoning now either).

I've released 0.8.0 now, if you'd like to use the shutdown trigger.