Open basavaraj29 opened 2 weeks ago
If we end up using the rpc threadpool, ensure that there are metrics to figure out the following: If there are large number of rpc threadpool workers resuming waiters, then it can negatively impact new reads/writes, resulting in higher read/write latencies.
Jira Link: DB-13389
Description
We resume waiters from the wait-queue serially. In the process, When resuming a waiter, we try to require the shared in-memory locks with a deadline of 1s. And if this fails, we schedule the resumption of the scheduler which then tries to acquire the shared in-memory locks with deadline set to the rpc requests's deadline itself.
We seem to be using the wrong thread for resumption of such contentious waiters (the ones that couldn't acquire the shared in-memory locks in 1s). In particular, we are currently doing
which uses
and the ioservice points to a threadpool of size 4.
We should have instead used Messenger's
normal_thread_pool_
(size of 1024) which is used to execute all rpc requests.The problematic change was part of this commit. Prior to that, the contentious waiter requests would have just failed with message - unable to acquire locks.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information