http-rs / tide

Fast and friendly HTTP server framework for async Rust
https://docs.rs/tide
Apache License 2.0
5.06k stars 322 forks source link

Graceful shutdown #528

Open ivan770 opened 4 years ago

ivan770 commented 4 years ago

Tide could store JoinHandle of every request it handles, and await them when server is shutting down.

https://github.com/http-rs/tide/blob/7d0b8484cafbfacf4c3c6859caa83a8c9756202e/src/server.rs#L300-L311 async-std book

Fishrock123 commented 4 years ago

I think the server only shuts down in a panic! or due to an external signal, such SIGINT, or whatever the Windows equivalents are.

I suppose this could possibly be desirable for SIGTERM which allows for such a thing. If you panic I'm not sure it is ever really safe to continue.

Fishrock123 commented 4 years ago

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

ivan770 commented 4 years ago

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

Yes. I mean, new requests are not handled, but at least those that are still awaiting response should be finished

mehcode commented 4 years ago

Was just coming to make an issue for this. It's critical from a production standpoint that a load balancer or swarm manager can initiate a controlled or graceful shutdown.

Any requests that came in before the "shut down please" request should still finish. Icing would be a way to shutdown with a timeout but I think we can compose that if there is an API for shutdown.

The way I envision usage would be a shutdown method on Server. As Server is clone-able we could clone, wait for SIGTERM, and call shutdown on it.

repnop commented 4 years ago

Definitely agree with @mehcode. Having the ability to gracefully shutdown the server is pretty important for me as well. :+1:

ririsoft commented 4 years ago

FWIW here is how I do it using the async-ctrlc crate:

use async_std::prelude::FutureExt;

async fn run() -> Result<()> {
    let ctrlc = async {
        CtrlC::new().expect("Cannot use CTRL-C handler").await;
        println!("termination signal received, stopping server...");
        Ok(())
    };

    let app = async {
        let app = tide::new();
        app.listen("localhost:8080").await
    };

    app.race(ctrlc).await?;
    println!("server stopped.");
    Ok(())
}

Please note that this will not wait for all running requests to complete, this is a "brutal graceful shutdown", if I dare. But at least it handles SIGTERM and CTRL-C shutdown (cross platform) which is better than nothing.

repnop commented 4 years ago

I did try messing around with adding graceful shutdown to tide yesterday evening and even got tests working, but ran into a bit of a problem that I'll outline in hopes someone better at async code can figure out a better solution.

The actual waiting for requests to finish is pretty simple with the design I used: spawn a task that uses a Receiver (I used the futures-channel channel) and use that as a stream to await all job handles and when the channel is closed from the sender side, it'll finish all of the jobs and break out of looping, which can be used to signal all finished requests have been finished.

The other part seems just as simple as well: if someone has initiated a shutdown (I used an Arc<AtomicBool> here and set it to true in a shutdown method), stop listening for new connections, close the sender portion of the channel, and wait for the existing ones to finish (I also had it serve up 503s during this time), then return. This worked, except that using stream combinators to figure out when to stop polling would result in it waiting for a request to be served after shutdown was initiated because, I think, the TcpListener was only notifying the waker it was ready when there was new data on the socket, at which point it would then see shutdown is true on the next poll cycle, and go as expected.

To get around this I was able to use a select! and check the value of shutdown if the incoming.next() didn't resolve, but to make it stop infinitely looping there needed to be a task::yield_now().await on the false case, which seemed less than ideal, and where I ended my attempt.

If anyone has thoughts on how to avoid the issue, do let me know and I can give them a try and report back. Or if anyone else wants to take a stab at implementing it themselves, by all means :+1:

yoshuawuyts commented 4 years ago

I've thought about this a little in the past, and gathered some notes on this. What I think we need to do is use stop-token here: https://github.com/http-rs/tide/blob/293610b7db1ab9a69a338e7d7db5033ae32bc376/src/server.rs#L334-L335 and here https://github.com/http-rs/tide/blob/293610b7db1ab9a69a338e7d7db5033ae32bc376/src/server.rs#L292-L293

This allows us to stop receiving incoming requests and allow the server to gracefully exit once a token has been received.


However stop-token is not quite the end of the road for us. What I think we need is a tighter integration with Stream and Future. I've also gathered some notes on a design for async-std we could use to make cancellation easier. If we had that we could probably use it to enable cancellation of Tide as well:

let src = StopSource::new();
let token = src.token();

// Send a token to each app we initialize
task::spawn(async move {
    let mut app = tide::new();
    app.stop_on(token.clone());
    app.listen("localhost:8080").await?;
});

// app will stop receiving requests once we drop the `StopSource`.
// this can be be done in response to e.g. a signal handler.

It's a bit of a question how we'd combine this today. We could probably define our own StopToken for the time being, and implement the API outlined above. But once we have this in async-std we could probably rely on that. Hope this is helpful!

GWBasic commented 3 years ago

I'm learning Rust, so I tried to see if I could fix this: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1

Specifically, I solved @repnop 's problem by using future::select. It allows awaiting on two futures and returns when the first completes: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-1f14b8b6953e2388b19efaf814862a89c0b57c45a6814f79ed373fde05d864d0R82

I then created a "CancelationToken" future that returns once it is canceled.

So far, I only know how to test tcp_listener, and I adjusted its tests to work by canceling gracefully: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-0f530b03216a301ca1004179731df8510170be421ff7abf545db673eca3bc4beR12

I haven't done anything with waiting for ongoing requests to complete, or testing other listeners. I must admit that I've tried to return futures from the listen function; but that leads to lots of fighting with the borrow checker. That's a bit more than I can handle as a Rust novice. ;)

Anyway, if you believe my branch is in line with tide's overall design philosophy, I'm happy to make a pull request and change whatever you want. (Especially for testing the rest of the listeners.) Otherwise, this bug was a great learning exercise for me.

Fishrock123 commented 3 years ago

@GWBasic Can you make at least a draft PR?

GWBasic commented 3 years ago

@Fishrock123 : See https://github.com/http-rs/tide/pull/746

(Note that some merge conflicts creeped in.)

Looking forward to your feedback!

GWBasic commented 3 years ago

@Fishrock123 : I had too many merge conflicts in #746, so I created https://github.com/http-rs/tide/pull/766

It uses futures-lite.

Thoughts? Again, I'm mostly interested in learning Rust, so if you'd rather do this in a different way, that's fine with me.

pbzweihander commented 3 years ago

I've also made #836. Please give any thoughts or suggestions!

KoltesDigital commented 2 years ago

Any news?

yoshuawuyts commented 2 years ago

I think it's important to break this down into two parts:

  1. What is the design as it would be if async Rust had all primitives implemented.
  2. What design can we implement now?

A design for Future Rust.

I've recently written about async cancellation. For Tide, the change we need to make is that our calls to task::spawn (e.g. tcp_listener, unix_listener) implement "cancel on drop" semantics. So that when the Server instance is dropped, all tasks are cancelled.

We could do this using e.g. the async-task-group crate instead of calling async_std::task::spawn directly. Within the spawned tasks we could make it so each existing request finishes executing in an AsyncDrop destructor, so by the time the server finishes dropping all requests have finished executing.

Using (for example) the stop-token crate, we could then shutdown a server like:

use stop_token::prelude::*;
use std::time::{Duration, Instant};
use async_std::task;

// Create the server and register a route which takes some time to complete.
let mut server = tide::new();
server.at("/").get(|_| async move {
    task::sleep(Duration::from_secs(10)).await;
    Ok("hello")
});

// Start listening for incoming requests, but close the server after 3 secs.
// Instead of waiting for a duration we could instead also wait for an external signal.
// Either way, this should graciously wait for all requests to terminate.
server
    .listen("localhost:8080")
    .timeout_at(Instant::now() + Duration::from_secs(3));
    .await?;

println!("finished cleaning up all requests!");

What can we do now?

Unfortunately we currently don't have access to async Drop so we're limited by how we can implement this. The biggest downside of this is that instead of being able to use composable cancellation (e.g. when Server is dropped, we gracefully shut down), we need to hard-code a shutdown mechanism into Tide.

Likely the easiest route to go would be to hard-code a Server::timeout_at method which takes a Deadline. The question is whether we would want to take a dependency on an external crate for this, or provide this mechanism ourselves. I feel that if we were to provide it ourselves, it might be the most resilient. But keen to hear thoughts on this.


I hope this helps clarify what the current thinking is!

KoltesDigital commented 2 years ago

Thanks!

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Maybe app.listen could yield a kind of RunningServerHandle from which .shutdown() could be called. I've done a lot of NodeJS servers with express and those app object have listen and close methods, I think it is more idiomatic to give shutdown once listen has been called. Maybe this handle could also have a block method, in order to get back the current behavior, i.e. most users would simply change from app.listen(...).await?; to app.listen(...).block().await?;. Sadly it's a breaking change.

GWBasic commented 2 years ago

Hey, thanks for the mention. I'm leaving this thread: I only attempted (https://github.com/http-rs/tide/pull/766) as a learning exercise.

Best of luck!

yoshuawuyts commented 2 years ago

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Drop is not guaranteed to be called because mem::forget is not unsafe, which means for soundness purposes we cannot rely on it. This is far less about "what could we theoretically do" and more: "what is the default behavior". In cases like these we can absolutely expect Drop to just run, and we don't need to worry about it not working [^mutex].

[^mutex]: If Drop wasn't something we could rely on, abstractions such as Arc, Mutex, etc. would be so unreliable that they'd be pointless. But luckily we can just rely on Drop for almost everything except when we're talking about soundness.