Timeout redesign for 6.x

tarcieri commented 11 months ago

This is a tracking issue for redesigning the timeout subsystem in http 6.x.

It would be good to come up with a new design first, and then PRs implementing it.

The largest single issue seems to be having a cumulative timeout over the entire request which actually works (originally requested in #562).

Several people have expressed interest in working on this, and I believe some code has already been written, although isn't available in a public PR.

cc @ClearlyClaire @misalcedo

misalcedo commented 11 months ago

I was comparing this HTTP client to others in Ruby and one thing I would like to add for timeouts is that global and per-operation are not mutually exclusive. I may want a request to time out after 1 minute, but spend no more than 150ms on the TCP connection portion or spend no more than 10 seconds idle on a read or write operation.

An alternative interpretation is what HTTPX does, which is to treat read as a timeout reading the response, write as a timeout writing the request and global as a overall timeout.

In my experience, except for streaming, HTTP clients care about total request timeout and maybe connect timeout, but definitely not per-operation timeouts. Streaming typically cares about operation timeouts and only optionally cares about a total timeout.

Often clients measure time to first byte from the server as well as overall latency, so timeouts on both of those could be helpful.

I am happy to help implement this type of behavior for 6.X if that is the target version.

Quoting my thoughts from https://github.com/httprb/http/pull/754#issuecomment-1853966126 here.

misalcedo commented 11 months ago

A couple of questions we probably want to answer with a design:

What aspects of the request-lifecycle do we want to expose timeouts for?
Should any of these timeouts be mutually exclusive?
What guarantees do we want to make for upper and lower bounds on timeouts, if any (i.e. bounds on configured timeout versus when the client actually gives up)?
Can requests override the client's timeouts? Override only a subset? Or override none?

misalcedo commented 11 months ago

The useful stages of the request lifecycle to me are:

connect
write (request)
- write (socket)
read (response)
- first byte
- parse the response
- read (socket)
total time for a single request-response interaction

I don't see a benefit to mutually exclusive timeouts as timeouts can just be a min-heap (or array sorted in descending order) of deadlines before the client gives up.

As for guarantees, this depends on how often we check the current time against the next deadline. If we spend most of our time in socket operations (likely), then checking only on socket operations may be a tight enough bound.

I don't think requests should override the connect timeout as requests don't map one-to-one to connect operations. The rest of the timeouts make sense for a request to want to override. The reason I think this is fine is because a total timeout covers the case where the connection takes too long. If the client is slow to connect, having a tighter bound on just the connect operation is an optimization to fail fast since the client is willing to wait until the total timeout if it was the request or response portion that was slow.

ixti commented 11 months ago

One thing to keep in mind during the design (something that we overlooked in the current implementation) is that read timeout should correctly handle timeouts during both stages:

read headers
read body

tarcieri commented 2 months ago

@ClearlyClaire @misalcedo we're looking at shipping v6.0.0 in #790. It's the last chance to make breaking changes to the timeout subsystem. Would either of you like to open a PR, or find someone else who would?

midnight-wonderer commented 1 week ago

I just made a pull request remotely related to this: https://github.com/httprb/http/pull/793 (Happy Eyeballs)

My opinion (my requests (if I may))

If someone is introducing breaking changes about timeout, I would recommend lumping timeouts as parts of network implementation abstractions. Probably, allow specifying timeouts in the constructor of TCP plugins instead of configuring it as the core part of HTTP.rb. The two are way too coupled.

Real-world example

In my pull request, I swap default_socket_class with another. This may lead to the obsolete of

::Timeout.timeout(@time_left, ConnectTimeoutError) do

implemented there.

The new class also supports DNS resolution timeouts.

I hope

Lumping timeout-related options into the TCP implementation abstractions will leave more options for the future when we want to introduce new improvements without breaking changes. Easily manage different possible configurations for different network backends.

midnight-wonderer commented 1 week ago

requests don't map one-to-one to connect operations.

Exactly my thought. In retrospect, it is actually fine to leave read/write timeout in the core options of HTTP.rb, where the timeout must have the context of HTTP. However, connect timeout is different; the context is about SYN/ACK, which is quite far from the HTTP layer.

My Expression

In response to

Several people have expressed interest in working on this

I could help try extracting the connect timeout into the TCP backend layer. Probably just an upgraded version of #793.
However, I don't have a clue about integrating HTTP/3, so keep in mind that whatever interface I come up with might need another breaking change if HTTP.rb wants to support HTTP/3 in the future.

I don't write a lot of code if you green light this:

Expect it to be small.
You can reject or make a comment scrapping the idea after seeing the abstraction.
I can't help with other aspects of the timeout discussed. Just moving the connect timeout to where it should belong.

httprb / http