Registry API performance issues: host connection pool & blocking DB IO

t83714 commented 1 year ago

Registry API performance issues: host connection pool & blocking DB IO

When the database is under a very high workload, the registry API will experience some performance issues, mainly due to the incorrect implementation of the following 2 areas:

HTTP client host connection pool handling
- We create a new queue for every request, which is an anti-pattern for the host connection pool. Why?
- It will increase the parallel requests number until hitting max-open-requests limit when the server can handle the request faster enough.
  - When it happens, all new requests will get an [BufferOverflowException] and failed
  - This behaviour will likely increase the burden on the DB during the peak hours as it will push the parallel requests number to the max.
- We couldn't set OverflowStrategy as each request is handled by a different queue
- Solution:
- Use the same request processing queue to handle all requests to the same host target
AKKA HTTP server Rest API handles Blocking DB IO with the routing thread pool.
- When DB IO takes a long, the routing thread pool might be starved and can't handle new requests.
- This will cause even worse problems. e.g. k8s pod liveness probe endpoint /v0/status/live will stop working and k8s will keep restarting the pod.
- Solution:
- Use a separate block-io thread pool to handle all blocking DB io instead of routing threads by default.
- We can implement custom route directive completeBlockingTask to complete blocking IO with a different pool. i.e.
  - leveraging extractActorSystem directive to extract Actor system, use it to lookup blocking dispatcher and complete the inner route with complete route directive
- or We can implement custom directive withBlockingTask to switch the inner route execution context to the blocking thread pool with withExecutionContext directive

t83714 commented 1 year ago

work can be found from the branch: https://github.com/magda-io/magda/tree/akka-http-perf-fix

t83714 commented 1 year ago

closed via PR: https://github.com/magda-io/magda/pull/3465

magda-io / magda

Registry API performance issues: host connection pool & blocking DB IO #3463

Registry API performance issues: host connection pool & blocking DB IO