Closed judofyr closed 3 years ago
I also tried to add support for it myself, but I couldn't find a good way to write a test for it…
The simplest way to do this is to use a semaphore to limit the number of active connections.
require 'async'
require 'async/http/internet'
require 'async/semaphore'
Async do
internet = Async::HTTP::Internet.new
semaphore = Async::Semaphore.new(2)
keywords = ["apples", "oranges", "bananas", "peaches", "apricots"]
keywords.each do |keyword|
semaphore.async do |task|
url = "https://google.com/search?q=#{keyword}"
task.logger.enable(task)
task.logger.debug(task) {"Fetching #{url}..."}
request = internet.get(url)
count = request.read.scan(keyword).count
task.logger.info(task) {"Found #{count} #{keyword}"}
end
end
ensure
internet&.close
end
There is no guarantee about how the underlying Async::HTTP::Internet
instance will manage connections... if you connect to different hosts, or whether you can use HTTP/2 multiplexing. The pool will try to make as many connections as required to minimise latency and maximise concurrency. It's up to you to decide whether this is appropriate or not.
If you are willing to expend more effort, you can specify the maximum number of connections (connection_limit:
) here https://github.com/socketry/async-http/blob/master/lib/async/http/client.rb#L36
We could pass this as an option via Async::HTTP::Internet
but it would apply to all connections. For HTTP/2 and HTTP/1 the semantics are a bit different, since HTTP/1 we should allow 4-8 connections, but HTTP/2 might be okay with 1 connection multiplexing 100 requests (typical default).
This forces to use only one HTTP connection.
I show debug output.
require 'async'
require 'async/http/url_endpoint'
require 'async/http/client'
Async do |task|
task.logger.debug!
endpoint = Async::HTTP::URLEndpoint.parse("https://google.com")
client = Async::HTTP::Client.new(endpoint, connection_limit: 1)
keywords = ["apples", "oranges", "bananas", "peaches", "apricots"]
keywords.each do |keyword|
task.async do
url = "/search?q=#{keyword}"
task.logger.enable(task)
task.logger.debug(task) {"Fetching #{url}..."}
request = client.get(url)
count = request.read.scan(keyword).count
task.logger.info(task) {"Found #{count} #{keyword}"}
end
end
ensure
client&.close
end
HTTP/2 allows multiplexing so it's actually okay.
0.83s: #<Async::HTTP::Pool resources=1/100 limit=1>
| Wait for resource #<Async::HTTP::Protocol::HTTP2::Client:0x00007fa79b8b8db0>
0.83s: #<Async::HTTP::Pool resources=2/100 limit=1>
| Wait for resource #<Async::HTTP::Protocol::HTTP2::Client:0x00007fa79b8b8db0>
0.83s: #<Async::HTTP::Pool resources=3/100 limit=1>
| Wait for resource #<Async::HTTP::Protocol::HTTP2::Client:0x00007fa79b8b8db0>
0.83s: #<Async::HTTP::Pool resources=4/100 limit=1>
| Wait for resource #<Async::HTTP::Protocol::HTTP2::Client:0x00007fa79b8b8db0>
One thing we could probably do is have a default connection limit based on the endpoint/protocol. That means that by default the connection pool would only make one connection for HTTP/2 and I guess 4-8 for HTTP/1.
One thing we could probably do is have a default connection limit based on the endpoint/protocol. That means that by default the connection pool would only make one connection for HTTP/2 and I guess 4-8 for HTTP/1.
This is the feature I was thinking about. That way I can start plenty of requests and I'll get maximum concurrency without spamming a single server. If I want to limit the overall request-flow I could then wrap it in a semaphore.
Here is one idea: to implement this, make a sub-class of Async::HTTP::Client
.
class RateLimitedInternet < Async::HTTP::Internet
def client_for(endpoint)
connection_limit = endpoint.protocol.multiplex? ? 1 : 8
Client.new(endpoint, **@options, connection_limit: connection_limit)
end
end
Something like this will work. Maybe we can do it by default for Async::HTTP::Internet
. Not sure yet.
I researched the limits.
According to https://tools.ietf.org/html/rfc7230#section-6.4 there is no specific upper bound for the number of connections, although most browsers limit it to 6-8.
We could add something like the following:
def multiplexed?
self.protocol.multiplexed?
end
def default_connection_limit
if self.multiplexed?
1
else
8
end
end
However, it's tricky to know in advance whether this applies to HTTPS, since it could negotiate either HTTP/1 or HTTP/2... this is non-trivial to solve. It's entirely possible the first connection could be HTTP/1 and the second connection could be HTTP/2.
All things considered, I think the best solution would be something like:
internet = Async::HTTP::Internet.new(connection_limit: 8)
I think this is reasonable enough to solve your problem without over-complicating the implementation. I'm not sure there is a significant advantage to having protocol specific limits.
All things considered, I think the best solution would be something like:
internet = Async::HTTP::Internet.new(connection_limit: 8)
I think this is reasonable enough to solve your problem without over-complicating the implementation. I'm not sure there is a significant advantage to having protocol specific limits.
Is this the total limit of number of open connections allowed? I would prefer it a per-host (per-protocol is not so important), but I can always accomplish that by creating my own Internet
-instances per-host.
It's per-client/host.
There doesn't seem to be a way to specify the client's connection limit setting when you're using
Async::HTTP::Internet
. This is a problem for me as I'm getting throttled (violently) by the server. It would be great if async-http could handle the throttling instead of me adding another pool-abstraction.