ponder-sh / ponder

A backend framework for crypto apps
https://ponder.sh
MIT License
613 stars 95 forks source link

[Feature] Supporting historical sync source alternative to RPC #888

Open moose-code opened 5 months ago

moose-code commented 5 months ago

Hi there!

I've been doing some experimenting with using an alternative data source (as opposed to the RPC) for the default friend-tech ponder indexer created in the pnpm create ponder flow.

Its an interesting case as there more than 6 million events to be indexed and from what I can tell it will take more than 15hours to sync using a paid RPC service. My experimentation goal was to try and populate the ponder_sync.db >100x faster (than using an RPC) and then being able to start the ponder service and allow it to index using this inserted data.

So far I've found some pretty interesting results: 1 - I am able to fetch all the required log, block and transaction data for the ponder_sync.db in this case (+6m events, blocks and txs) in around 12minutes from scratch. 2 - Writing to the sqlite db is currently the bottleneck, and seems to take in the order of 24minutes for the fetched data to be written and persisted to the db.

The experimental approach essentially opens a connection to the ponder_sync.db and continually batch inserts required data into the blocks, logs, logFilterIntervals and transactions tables. It runs completely independently of the ponder core code and no ponder core code is modified. Following this alternative historical sync, running the main ponder process successfully starts indexing from this data.

The alternative to the RPC enabling this speed up is hypersync (disclaimer: I am part of the team). Its a fast flexible alternative to the RPC catered for data heavy use-cases such as indexing.

A deeper integration of hypersync as an additional alternative to RPC in the historical sync service in ponder core might allow Ponder users to achieve much quicker (>100x) historical sync times. This opt-in alternative could be useful for Ponder users who value or require this performance for their specific use-case. I completely understand this increases code complexity and maintenance. I was wondering if it might be worth it given the advantages it unlocks and curious to hear your thoughts and considerations.

Here is an example repo for you to try it out too: https://github.com/enviodev/friendtech-ponder-hypersync/tree/main (keep in mind its currently bottlenecked by being a single thread process frequently blocked by sqlite inserts - this would change)

derekbar90 commented 4 months ago

This is super interesting!