paradigmxyz / cryo

cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes
Apache License 2.0
1.12k stars 97 forks source link

Setting re-org buffer clamps the max block to the last full chunk boundary #193

Open BowTiedDevil opened 3 months ago

BowTiedDevil commented 3 months ago

Version 0.3.2

Platform Linux dev 6.8.11-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024 x86_64 GNU/Linux

Description Specifying --reorg-buffer with some non-zero value will cap the max block at a chunk boundary when "latest" is used as the upper end of the block range.

Here is an example using a basic fetch:

btd@dev:/tmp$ cryo blocks -b "20M:" --rpc http://localhost:8543
cryo parameters
───────────────
- version: 0.3.2
- data: 
    - datatypes: blocks
    - blocks: n=30,788 min=20,000,000 max=20,030,787 align=no reorg_buffer=0
- source: 
    - network: ethereum
    - rpc url: http://localhost:8543
    - max requests per second: unlimited
    - max concurrent requests: unlimited
    - max concurrent chunks: 4
- output: 
    - chunk size: 1,000
    - chunks to collect: 31 / 31
    - output format: parquet
    - output dir: /tmp
    - report file: $OUTPUT_DIR/.cryo/reports/2024-06-05_22-46-08.023601.json

[...]

And again with the reorg buffer specified:

btd@dev:/tmp$ cryo blocks -b "20M:" --rpc http://localhost:8543 --reorg-buffer 8
cryo parameters
───────────────
- version: 0.3.2
- data: 
    - datatypes: blocks
    - blocks: n=30,000 min=20,000,000 max=20,029,999 align=no reorg_buffer=8
- source: 
    - network: ethereum
    - rpc url: http://localhost:8543
    - max requests per second: unlimited
    - max concurrent requests: unlimited
    - max concurrent chunks: 4
- output: 
    - chunk size: 1,000
    - chunks to collect: 30 / 30
    - output format: parquet
    - output dir: /tmp
    - report file: $OUTPUT_DIR/.cryo/reports/2024-06-05_22-47-01.635999.json

[...]

The 788 blocks following the last full chunk (size 1000) are ignored. The responsible code is likely the filter_map in apply_reorg_buffer.