When we start a node with connections to external RPC servers (as a minimal node), we lack metrics around how many individual calls we're doing to the remote RPC servers and their duration. This PR adds metrics that measure durations of each RPC call made by the minimal nodes, and implicitly how many calls there are.
Closes #5409
Closes #5689
Integration
Node operators should be able to track minimal node metrics and decide appropriate actions according to how the metrics are interpreted/felt. The added metrics can be observed by curl'ing the prometheus metrics endpoint for the ~relaychain~ parachain (it was changed based on the review). The metrics are represented by ~polkadot_parachain_relay_chain_rpc_interface~ relay_chain_rpc_interface namespace (I realized lining up parachain_relay_chain in the same metric might be confusing :). Excerpt from the curl:
The way we measure durations/hits is based on HistogramVec struct which allows us to collect timings for each RPC client method called from the minimal node., It can be extended to measure the RPCs against other dimensions too (status codes, response sizes, etc). The timing measuring is done at the level of the relay-chain-rpc-interface, in the RelayChainRpcClient struct's method 'request_tracing'. A single entry point for all RPC requests done through the relay-chain-rpc-interface. The requests durations will fall under exponential buckets described by start 0.001, factor 4 and count 9.
Description
When we start a node with connections to external RPC servers (as a minimal node), we lack metrics around how many individual calls we're doing to the remote RPC servers and their duration. This PR adds metrics that measure durations of each RPC call made by the minimal nodes, and implicitly how many calls there are.
Closes #5409 Closes #5689
Integration
Node operators should be able to track minimal node metrics and decide appropriate actions according to how the metrics are interpreted/felt. The added metrics can be observed by curl'ing the prometheus metrics endpoint for the ~relaychain~ parachain (it was changed based on the review). The metrics are represented by ~
polkadot_parachain_relay_chain_rpc_interface
~relay_chain_rpc_interface
namespace (I realized lining upparachain_relay_chain
in the same metric might be confusing :). Excerpt from the curl:Review Notes
The way we measure durations/hits is based on
HistogramVec
struct which allows us to collect timings for each RPC client method called from the minimal node., It can be extended to measure the RPCs against other dimensions too (status codes, response sizes, etc). The timing measuring is done at the level of therelay-chain-rpc-interface
, in theRelayChainRpcClient
struct's method 'request_tracing'. A single entry point for all RPC requests done through the relay-chain-rpc-interface. The requests durations will fall under exponential buckets described by start0.001
, factor4
and count9
.