Avoid using shared connection pool

matsluni / aws-spi-akka-http

This is an alternative implementation of the new aws java sdk v2 non-blocking async SPI. It provides an alternative to the built-in Netty implementation based on akka-http.

Apache License 2.0

33 stars 15 forks source link

Avoid using shared connection pool #15

Open matsluni opened 4 years ago

matsluni commented 4 years ago

With the current implementation a shared connection pool per ActorSystem is used for all requests.

Its probably better to have a dedicated connection pool per Host. See akka/alpakka#1958 and akka/alpakka#1983 for similar issue and PR.

gabfssilva commented 4 years ago

Any idea how to ideal with multiple regions? I wonder since depending on the region, it's a different host, so, it should handle multiple connection pools depending on the aws client usage.

matsluni commented 4 years ago

Hi @gabfssilva, thanks for giving this a thought. Yes, we would need multiple pools, each for every aws service endpoint (also possible multiple regions per service).

A first naive idea coming to my mind is getting the service url from httpRequest.uri and kind of build a map/cache with ServiceUrl -> ConnectionPool. But I don't know how feasible this is. This would be in the hot code path for every request.

gabfssilva commented 4 years ago

That's what I thought too. A synchronized map should be enough. Well, I'll think of something.

matsluni commented 4 years ago

I had another idea how a design for this could look like.

What if we extend the builder of the Akka async client with something like withCachedPoolSettings (maybe a more suitable name is better), where we let the user provide the endpoints and regions, used in user code. Out of this, we construct the map of cachedConnectionPools and for the request its just a simple lookup, without any thread synchronization needed.

We can also decide if we want to fail (exception), if an endpoint is not in the map or fallback to the sharedPool.

This approach makes it configurable for the user and avoid the potential synchronization performance penalty.

WDYT?

gabfssilva commented 4 years ago

I think it can be done, the only problem here is that the user would need to know which domains he needs to set up. Each AWS service has a different domain, also, using "fake aws" also implies in using different endpoints. I fear it become complex.

Instead of using a syncronized map we could use an actor to handle the pools:

 //if the pool does not exist, it's created here
val pool = (pools ? Gimme(domain)).mapTo[Pool]

for {
  p <- pool
  r <-  p.offer(request, promise)
  //check `r` if the request is queued
} yield promise.future

I ran a POC over here and it worked quite well, but, hard to measure any performance pernalty over the singleRequest approach. The only issue here is: the first request will always be much slower than the following ones, but, I'm not sure it happens already using singleRequest.