Registry API performance issues: host connection pool & blocking DB IO
When the database is under a very high workload, the registry API will experience some performance issues, mainly due to the incorrect implementation of the following 2 areas:
HTTP client host connection pool handling
We create a new queue for every request, which is an anti-pattern for the host connection pool. Why?
It will increase the parallel requests number until hitting max-open-requests limit when the server can handle the request faster enough.
AKKA HTTP server Rest API handles Blocking DB IO with the routing thread pool.
When DB IO takes a long, the routing thread pool might be starved and can't handle new requests.
This will cause even worse problems. e.g. k8s pod liveness probe endpoint /v0/status/live will stop working and k8s will keep restarting the pod.
Solution:
Use a separate block-io thread pool to handle all blocking DB io instead of routing threads by default.
We can implement custom route directive completeBlockingTask to complete blocking IO with a different pool. i.e.
leveraging extractActorSystem directive to extract Actor system, use it to lookup blocking dispatcher and complete the inner route with complete route directive
or We can implement custom directive withBlockingTask to switch the inner route execution context to the blocking thread pool with withExecutionContext directive
Registry API performance issues: host connection pool & blocking DB IO
When the database is under a very high workload, the registry API will experience some performance issues, mainly due to the incorrect implementation of the following 2 areas:
/v0/status/live
will stop working and k8s will keep restarting the pod.completeBlockingTask
to complete blocking IO with a different pool. i.e.withBlockingTask
to switch the inner route execution context to the blocking thread pool with withExecutionContext directive