socketry / async-http

MIT License
298 stars 45 forks source link

Connection timeout specification. #89

Closed souphan-adsk closed 2 years ago

souphan-adsk commented 2 years ago

The README specifies an example of handling timeout using #with_timeout.

This works well in cases where a response is expected in a certain amount of time, but I'd like to fail much earlier if a connection attempt was not done in a timely manner. Other libraries such as Faraday and Net::HTTP have this value configurable as open_timeout.

Is there any way Async:HTTP (or anything it's built on) can handle this kind of situation?

ioquatix commented 2 years ago

How do you imagine this working with connection persistence?

ioquatix commented 2 years ago

A couple of additional points:

I'd be interested to know more about whether your use case requires a separate timeout for connect vs read, and in what cases it matters.

souphan-adsk commented 2 years ago

Thank you for your comments. We are currently managing a request dispatch queue system that uses multi-user-inputted data for URLs. Typical issues with such system is that a particular endpoint may not reachable or may be very slow to respond. When this happens, naturally the queue is affected negatively.

With a configurable connection timeout, we're looking to handle cases where we could fail/reject early (i.e.: allow 2 second to connect, but 30 seconds to respond) when the inputted endpoint is unreachable due to firewall, or because it's a non-configurable IP address (10.255.255.1), etc.

As for the comment about connection persistence, I'll admit that our scope does not include that usage. But in my naive view, maybe that value would be used only for the initial establishing connection to the host (and not for idleness).

ioquatix commented 2 years ago

Async::HTTP::Client and Async::HTTP::Internet are both internally connection pools. For maximum performance you definitely want to use connection pooling. However, if you are talking to many thousands of hosts, you will want to be more restricted on how your pooling works.

endpoint = Async::HTTP::Endpoint.parse('https://example.com', timeout: 2)
client = Async::HTTP::Client.new(endpoint)

task.with_timeout(30) do
  response = client.get("...") # 2 second timeout for open and read.
  body = response.read 
ensure
  client.close
end 

We could look at introducing a specific operation which pre-seeds the client with an open connection, e.g.

task.with_timeout(2) do
  client.open # open a single connection to the remote server and return it to the pool (active) to be used later.
end 

If you think this solves your problem, we can do that. You'd then configure the timeout to be 30 seconds, and use with_timeout(2) around the initial connection.

souphan-adsk commented 2 years ago

That is indeed how we're looking to use the library. If that feature would be introduced as you describe it, it would solve our problem.

ioquatix commented 2 years ago

After reviewing the code, you can do the following right now:

task.with_timeout(2) do
  client.pool.acquire{}
end

This will make a single connection and warm up the pool.

I've also proposed #90 which provides client.open which does exactly the same as the above.

Not sure if this interface is worth the overhead. WDYT?

souphan-adsk commented 2 years ago

I can confirm that the provided lines up there does the trick pretty well. As for the interface, I personally don't mind either one, though I'll admit that I'm more familiar with the term open of the domain. In any case, thank you for the assistance on this matter.

ioquatix commented 2 years ago

Exposing the underlying connection pool either way seems like a leaky abstraction. I don't think HTTP clients as an interface should be manipulating underlying connections - might not even be connection based in some test scenario or maybe you are replacing HTTP with something entirely different and trying to preserve the same interface.

I'll think about whether merging this PR makes sense, but there is a solution which is talking directly to the connection pool and it's a valid solution which won't break in the future.

ioquatix commented 2 years ago

I think the direction we are going to go in here is exposing more of the connection pool details. The use case here is best served by using the connection pool directly simply because there are many different permutations of this use case and I don't think we want to try and build another layer on top.