river-build / river

https://river-sample-app.vercel.app
MIT License
23 stars 6 forks source link

Mainnet nodes are making >4mm rpc calls / day over rpc to base. Why? #1583

Open jterzis opened 5 days ago

jterzis commented 5 days ago

An operator is noticing >4mm rpc calls / day to Base (non-tx calls). Questions follow below.

How do Base rpc calls scale in the stream node and by which system are they constrained ? It seems there's a chain monitor in chain_monitor.go that runs for xchain against Base chainId 8453 with a poll loop that calls eth_getFilterLogs to process callbacks, but it's unclear whether the ChainMonitor polling loop constrains eth_call (non-transactional) rpc calls to Base from the stream node or whether there's another part of the system responsible for this level of throughput.

Also, which prom metrics should be used to isolate this network activity by chainId 8453 ? Is it:

Do we need to add lower level counters for eth calls made to base ?

bas-vk commented 4 days ago

The chain monitor performs eth_getBlockByNumber and eth_getLogs. It doesn't call eth_call. chain_monitor_pollcounter is plotted in a DD dashboard which shows that the chain monitor is making the expected amount of calls (~0.55 per second) on chain metrics that we have access to. I expect that this is also the case for that node.

Recently stream scrubbing was introduced. Each scrub task requires multiple eth_calls for each channel member. There is caching logic that prevents doing the same check too often but maybe there is some misconfiguration, a bug or there are simply a lot of channel and members. I suspect that stream scrubbing it the major contributor to the high amount of calls.

Here is a trace that makes 1246 eth_calls in a single scrub task.

clemire commented 4 days ago

The chain monitor performs eth_getBlockByNumber and eth_getLogs. It doesn't call eth_call. chain_monitor_pollcounter is plotted in a DD dashboard which shows that the chain monitor is making the expected amount of calls (~0.55 per second) on chain metrics that we have access to. I expect that this is also the case for that node.

Recently stream scrubbing was introduced. Each scrub task requires multiple eth_calls for each channel member. There is caching logic that prevents doing the same check too often but maybe there is some misconfiguration, a bug or there are simply a lot of channel and members. I suspect that stream scrubbing it the major contributor to the high amount of calls.

Here is a trace that makes 1246 eth_calls in a single scrub task.

There are 415 people in this channel. Therefore 1246 calls is 3 * # of people + 1.

Every linked wallet calculation is a maximum of two calls, and for a user with no linked wallets, the space membership will be another call. So 3x the # of users is expected if each user has no linked wallets. The extra call would probably be fetching channel entitlements for the READ permission.

I checked the space itself and all roles are user entitlement roles, so once the entitlement is fetched, there should be no additional contract calls required to evaluate it.

jterzis commented 3 days ago

eth_call volume likely originating from scrubber as per @clemire . Will explore a positive/negative cache which will eliminate ~1/3 of call volume (excl linked wallets) and increasing scrubber interval as well.