indexsupply / shovel

An Ethereum to Postgres indexer
https://indexsupply.com/shovel
MIT License
182 stars 23 forks source link

shovel: feature request - optimize rpc call #276

Open AlstonChan opened 2 weeks ago

AlstonChan commented 2 weeks ago

Issues

I noticed that the amount of RPC call to the node is more than I have expected. I created 5 integration to index the data, and the amount of rpc call made to the node is proportionally to the integration I have. For context, all my 5 integration uses only eth_getLogs and eth_getBlockByNumber(no txs) and I only indexed the base sepolia blockchain.

I ran the shovel instance for about 55 minutes (i made some error thus started the instances late) and capture the amount of rpc method called, below is the excerpt of the relevant method used:

Given that the block time of base sepolia blockchain is 2 seconds per block. 8264 / 55 / 30 ≈ 5 12426 / 55 / 30 ≈ 7.53

This means that for every integration table, a separate rpc call is made to get the logs.

proposal

Would it be possible that only one rpc call is made or at least fewer rpc call is made, then the event are being decoded and being processed to identify which table does the event data are supposed to be stored.

I am not familiar with go, but assume that all integration data are inserted into the db in a single transaction, the logic would require modifying the address constraint of the get_logs method, map the decoded event data to an array/array of struct/a data structure, then stored the data into db.

I am not familiar with go, so I can't help with it, but I figured I would give you some ideas as I found this to be a very useful program.

ryandotsmith commented 2 weeks ago

Firstly, thanks for the idea! I appreciate you taking the time to outline the scenario. Super helpful. Also, thanks for using Shovel. Glad to know you are getting some utility from it.

Here are some initial thoughts on this proposal:

Each integration in shovel maps to a task. A task has a source (rpc url) and a destination (pg table). Each task builds a filter based on the data requirements for the integration. When the task runs, it will use the filter to determine what sort of api calls to make. This process is somewhat described in the docs. In the case of logs, the task will use the filter to request only logs that are relevant to the integration. Downloading all logs will be slow when your integration only cares about a small subset of the block's logs.

Supporting a system where you can reduce the amount of logs that you download and coalescing log requests is possible but not a change that I will be making in the short term.

I'm not fundamentally opposed to this idea and I would invite a patch if you wanted to make one. If you do want to make a patch for this, please let me know and let's discuss the approach prior to writing code.

Finally, if you are interested in a more powerful and more efficient way of downloading logs, please take a look at the new Index Supply API: https://www.indexsupply.com

I am working on a sync tool that uses the Live Query API and this could be a good alternative to Shovel/RPC.

https://gist.github.com/ryandotsmith/c123ef9fd5f92bf300c8614964d043b9

AlstonChan commented 2 weeks ago

I understand the bigger picture of the program now, it seems that this isn't an intuitive feature to implement as multiple chains are required to be supported. My initial thoughts are that since the eth_getLogs rpc call are made to the same blockchain, multiple address and events constraint can be made in a single call. However, when considering multiple blockchain are configured for a single shovel instance, then knowing which integration to combine/optimize that uses the same chain requires more planning.

I can't contribute to shovel yet, at least not now because I am not familiar with go lang😅. I will look into the Live Query API, thanks for the input!