socketry / async-http

MIT License
320 stars 46 forks source link

Hooks as potential solution for custom timeouts #127

Open zarqman opened 1 year ago

zarqman commented 1 year ago

I've been exploring paths for setting timeouts for both HTTP servers and clients. My goal is specific timeouts at different stages in the request/response lifecycle. This is more pronounced on HTTP1, but there is some use with HTTP2 as well.

Since timeouts are presently set via Endpoints, and those aren't necessarily specific to the protocol client/server to be used, my thought is to add a handful of specific hooks into the Connection and related classes. This would avoid adding a bunch of *_timeout settings all over.

These hooks would likely just be empty methods (or a simple yield) in the default implementation. Then the idea would be that an app could inherit from the relevant Server or Client, or add a mixin, and then modify timeout= directly.

I could also see such hooks being used apart from timeouts, for example: metrics gathering (requests, bytes, etc).

So far, here are the hooks I'm thinking of:

I've read some of the other issues discussing timeouts. They seem to partly address Client usage, but less so Server usage. Where with_timeout works, it would remain the recommended approach. I see hooks like this being complementary and not a replacement for that.

I'm happy to prepare a PR for this, but I wanted to first check and see if it goes in a direction you're comfortable with?

Open to any thoughts, suggestions, or questions as well, of course.

Semi-related, but I've also considered an error-handling hook(s) to help intercept various exceptions before they result in "task ended with unhandled exception". Would that seem appropriate? Such a hook could be used to silence such errors, count them for metrics, etc. Or, would there be any interest in PRs to just silently rescue some of those if a hook seems like too much?

PS: Thanks so much for the incredibly useful gem (and the whole async ecosystem)!

ioquatix commented 1 year ago

In theory, this makes sense.

However, it can be hard to maintain as it exposes the internal details of the implementation.

If it's not huge, please consider making a PR to demonstrate the hooks and how they can be used.

Bearing in mind, this will continue to get more complex with the addition of HTTP/3.

Also, most users will find such an interface hard to use in general.

I'd like to think we can specify general timeout (or timeouts). If users have to configure many timeouts to make things robust, it will simply not get done or used correctly.

Recently I introduced IO#timeout. This model is easy to understand. I think, ideally, everything else is as automatic as possible.

trevorturk commented 1 year ago

@ioquatix would you share more about IO#timeout if possible? I hadn't seen it introduced and I'm curious to learn what it's used for!

ioquatix commented 1 year ago

Async::IO has supported #timeout since inception, I would argue, in short, that it's required for robustness in any non-trivial system. When I first made Async::IO it was designed as a shim - ideally it's thrown away eventually. To use raw IO instances, I need to improve feature parity, i.e. introduce IO#timeout. The internal Ruby IO code does have support almost everywhere for timeouts, it just didn't have any way to specify a value. All I really did was introduce this per-IO and update all the "waiting for things to happen" code with this value instead of Qnil.

There is one more thing I'd like to introduce to IO - a high water mark. Essentially, something like this:

io.limit = 1024
io.gets # fail if more than 1024 bytes is read.

This prevents memory exhaustion.

Another feature I'm also kind of interested in is minimum data rates to prevent slowloris style attacks generally, but this might be too specific - and it's probably covered sufficiently by having a timeout.

trevorturk commented 1 year ago

Very interesting, yeah I agree trying to keep timeouts tight is an important part of system robustness -- practically required.

Somewhat related, I recently noticed that my weather data fetching system is both CPU and memory bound, and I've needed to increase my server count if an upstream API is running slowly -- memory exhaustion is the eventual result if capacity is low. (I'm thinking also of https://github.com/socketry/async/pull/271.)

For the idea of a custom timeout system, I've been using custom timeouts per HTTP host. So, if I know a certain host (i.e. Apple Weather) has p99 response times above my default threshold. I'm trying to figure out how to have a better system here, versus "just" using a bunch of with_timeout blocks. I'd be curious to see any ideas there!

zarqman commented 1 year ago

As requested, I've created a pair of PRs (socketry/async-http#130 and socketry/protocol-http1#21) to explore this. It's just a spike and is not complete. I'm hopeful it will further the discussion.

In addition to the hooks, which are minimal, I've included an example server and client.

Recently I introduced IO#timeout. This model is easy to understand. I think, ideally, everything else is as automatic as possible.

I've only recently come to work with the async gems, but I'm glad this is here. I'm not sure it'd be possible to do what I'm proposing without it.

My primary goal is to dynamically change the value of IO#timeout based on the connection (or request/response) lifecycle. The examples focus on this.

I've also included a couple sketches around metrics, although only for HTTP1 at this point.

However, it can be hard to maintain as it exposes the internal details of the implementation.

Finding a maintainable path is one of my key goals. Right now I'm using a bunch of monkeypatches locally. It's ugly at best, and likely to be fragile longterm. I'm hopeful we can find a path to something better.

Notes on the PRs

The examples demonstrate that these hooks aren't too bad to configure. Definitely cleaner than monkeypatches. I'm not a huge fan of the non-parallelism of the include for HTTP1 vs HTTP2. Since there are various classes to perform the include against, these could be tweaked a bit.

Though not demonstrated here, I'm still surrounding client requests with an overall timeout via with_timeout. (My use-case is proxy-like, so dealing with both server & client together.) The goal is to tightly limit certain points in the connection or request, in addition to also limiting the overall request.

I've also explored separate :timeout settings. I believe those would have to be added to Async::HTTP::Endpoint. Based on my work, those might be:

Lastly, while preparing these PRs I think it might also be practical to just break apart a couple existing methods to make it easier to extend them. For example, read_request would be easier to extend if read_line? at the top was separate from the rest of the method. This doesn't feel particularly accessible to me, but if you wanted to go this route, I could also create an example for how to use it so at least people would have a reference.

PS: @trevorturk, nice to run into you here. HelloWeather is my favorite weather app. 😄

trevorturk commented 1 year ago

I think this all seems reasonable -- basically you need more hooks to avoid monkey-patching.

Feel free to ignore this, but it occurred to me to look for "prior art" and I found https://lostisland.github.io/faraday/#/middleware/included/logging which might be an interesting kind of direction to consider. I could see a system where hooks are added to Async::HTTP over time.

I guess I just don't know if Async::HTTP is supposed to be lower-level and this would be outside of the scope, or if you really would need the hooks into this lower level in order to avoid long term maintenance issues. Perhaps worth considering making a "wrapper" that would do the minimal method overrides needed to see how bad it looks?

PS: I'm so happy to hear you're a Hello Weather user! Async::HTTP powers the whole thing! 🥳

zarqman commented 1 year ago

On the http1 side, I've just submitted socketry/protocol-http1#23 which simply breaks out reading of the initial request and response lines, as contemplated above. This makes it easier to precisely target a monkeypatch while adding no particular overhead to protocol-http1.