twitter / finatra

Fast, testable, Scala services built on TwitterServer and Finagle
https://twitter.github.io/finatra/
Apache License 2.0
2.27k stars 408 forks source link

Unexpected making connection time cost during high concurrency. #380

Closed odinliu closed 7 years ago

odinliu commented 7 years ago

Our company did some load tests on a service implement by finatra. And we found some strange issue. Our restful service use nginx as a reverse-proxy, and nginx's access log show that the $upstream_connect_time of some requests were nearly 1 second, but the real process time is hundreds milliseconds. Here is nginx log example(1.001 is $upstream_connect_time, 1.303 is $request_time and $upstream_response_time):

127.0.0.1|-|29/Dec/2016:19:08:47 +0800|GET /teacher/search?q=Zachary HTTP/1.1|200|2699|-|Go-http-client/1.1|-|1.303|1.303|127.0.0.1:8888|1.001

And I reproduce the problem using a simple use case:

@Singleton
class LoadController extends Controller with Logging {
  val ret = """{"timeout":false,"took":3}"""
  val pool = FuturePools.unboundedPool("RequestPool")
  get ("/teacher/search") { req: Request =>
    pool {
      info("req received.")
      Thread.sleep(300)
      response.ok(ret)
    }
  }
}

I reproduce it under 2.1.6 and 2.7.0. Mostly requests as expected, but some spent nearly 1 second to make a connection.

Our controller use a unbounded pool to work with some blocked method. I use a Thread.sleep to simulate this. The qps is 1600, and few cpu usage. So I don't think it have reached the performance limit.

I'm not familiar with finagle. So is there anything to optimize this?

kevinoliver commented 7 years ago

Using Thread.sleep is an anti-pattern in evented systems like Finagle/Netty. Here's more information about it: https://finagle.github.io/blog/2016/09/01/block-party/

To change your test you can use Future.sleep instead.

odinliu commented 7 years ago

Hi, @kevinoliver I change the test case to:

@Singleton
class LoadController extends Controller with Logging {
  val ret = """{"timeout":false,"took":3}"""
  val pool = FuturePools.unboundedPool("RequestPool")
  get ("/teacher/search") { req: Request =>
    pool {
      info("req received.")
      Future.sleep(300 milliseconds)
    }
  }
}

But it still have some requests spent nearly 1s to make connections. Even the amount is very few(784/1,000,000), but the cpu usage is very low. That's not make sense. I suppose finagle dealing with network connections and processing callbacks with the same event loop? So if more and more high level future callback event emit, there is a chance low level network event would be delayed?

In our real scenarios, we encapsulate the blocking operation(such as db/elasticsearch requests) in Scala Promise/Future. The logical layer using Await to control timeout and wait for future composition done. And finatra's controller work with our blocking logical layer through a FuturePool.unboundedPool.

Speaking to Future, lots of 3rd scala library using the official future. Even if finatra 2.7 support scala future by converting it to twitter future, it is painful to mix two kind of future in the same project. So since scala future is the officially implement, does twitter have any plan to use scala future instead of twitter future in the future?

syamantm commented 7 years ago

possibly an unrelated question but how does your upstream proxy configuration look like? do you have HTTP 1.1 enabled and keepalive configured ?

kevinoliver commented 7 years ago

The logical layer using Await to control timeout and wait for future composition done

sounds like you may have other explicit blocking happening on the event threads. what do your metrics for scheduler/blocking_ms show? there may also be implicit blocking happening that isn't measured in that metric. best way to look for that might be to look at some stack traces when you are running your test (see /admin/threads)

So since scala future is the officially implement, does twitter have any plan to use scala future instead of twitter future in the future?

no plans for this. over time, scala's future implementation has gotten closer to what ours has which is great. i believe there is still functionality missing around interruptions.

taylorleese commented 7 years ago

@odinliu You can find more information on the differences between Scala and Twitter futures and why Twitter futures exist here.