Add new RpcDataIngestSettings for controlling split size

dvli2007 commented 2 weeks ago

When fetching blocks from RPC, each iteration will only process 10 blocks. This is set with a magic number here.

This PR removes this splitSize magic number and instead makes it an RpcDataIngestSettings value called maxBlocksPerIteration. Users may now pass in an arbitrary non-negative value. If no value is set, then the default is 10 blocks.

eldargab commented 1 week ago

What's your use case for tweaking this parameter?

To increase data ingestion speed one most likely wants to increase concurrency.

If you are falling behind the head due to non-trivial per-batch processing cost, then you better to optimize your mapping code in such a way, that single block batch is handled faster than block production.

dvli2007 commented 1 week ago

What's your use case for tweaking this parameter?

To increase data ingestion speed one most likely wants to increase concurrency.

If you are falling behind the head due to non-trivial per-batch processing cost, then you better to optimize your mapping code in such a way, that single block batch is handled faster than block production.

We have no mapping logic in our code. This is just running the SQD indexer with the @subsquid/bigquery-store store on a chain with fast block productions -- no storage other than SQD processor internals updating the block height. The BigQuery height updates are not streamed so there's crazy latency, making the current configurations not really usable.

subsquid / squid-sdk

Add new RpcDataIngestSettings for controlling split size #354