JSON-RPC: performance problem with `chainHead_v1_storage` queries using `descendantValues`

josepot commented 1 week ago

We’ve encountered a performance issue when executing chainHead_v1_storage queries with the descendantValues option in the new JSON-RPC API.

When performing such a query, the RPC node sends an operationStorageItems notification containing only 5 items. This is immediately followed by a waitingForContinue notification. Upon receiving this, we immediately respond with a chainHead_v1_continue request, and this cycle repeats.

This results in certain queries taking an excessively long time to resolve. For example, requesting the descendant values of NominationPools.PoolMembers on the Polkadot relay-chain can take 6 to 10 minutes to complete using the new JSON-RPC API, while the same request takes only a few seconds with the legacy RPC API.

Expected Behavior

The node should efficiently return a larger number of items per operationStorageItems notification, especially when there is no sign of back-pressure (e.g., if the chainHead_v1_continue response is received promptly). Ideally, the node could send hundreds of items at once, and dynamically adjust the number of items sent based on the system's responsiveness.

Current Behavior

The node currently sends only 5 items per operationStorageItems notification (and always requesting a waitingForContinue notification), significantly slowing down the resolution of large storage queries.

Proposed Solution

Increase the number of items sent in each operationStorageItems notification, potentially sending several hundred at a time.
Adapt the number of items sent based on the system’s responsiveness (e.g., if the chainHead_v1_continue response is received quickly, send more items in the next notification).

Logs

slowOperationStorageItems.log

jsdw commented 1 week ago

Just to copy in my thought on approach too:

Given that we have backpressure we could also just never emit the waitingForContinue; the node could internally put storage messages into a queue, draining it as fast as the client can accept messages from it, and if the queue fills then the node won't internally try to fetch more storage values until it has spaces again. This would hopefully allow it to be pretty quick!

josepot commented 1 week ago

Just to copy in my thought on approach too:

Given that we have backpressure we could also just never emit the waitingForContinue; the node could internally put storage messages into a queue, draining it as fast as the client can accept messages from it, and if the queue fills then the node won't internally try to fetch more storage values until it has spaces again. This would hopefully allow it to be pretty quick!

works for me!!

paritytech / polkadot-sdk