streamingfast / substreams

Powerful Blockchain streaming data engine, based on StreamingFast Firehose technology.
Apache License 2.0
159 stars 45 forks source link

Missing full-store in cache results in "opening file: not found" error #222

Closed sduchesneau closed 1 year ago

sduchesneau commented 1 year ago

This happens when the backend cache is missing a full-kv file in its sequence (ex: 10, 20, 40).

It is not a scenario that should happen during normal operations, but if someone is moving kv files around, this case is not handled well.

Here's an example log entry:

An error occurred while streaming blocks: status: Internal, message: "error during init_stores_and_backprocess: failed run_parallel_process: parallel processing run: scheduler run: process job result for module \"unknown\": job ended in error: receiving stream resp: rpc error: code = Internal desc = error building pipeline: failed to setup subrequest stores: load full store: load full store erc721:store_address at 0006052000-0000462663.kv: opening file: not found", details: [], metadata: MetadataMap { headers: {} }, provider: mainnet-substreams-sf, deployment:

A workaround is to delete the cache, but it is inefficient.

Wanted behavior: the substreams engine should schedule the missing jobs and inform the "squasher" that it needs to process these missing file segments.

How to reproduce locally

  1. In one terminal run (in firehose-ethereum devel folder):

    DEBUG=.* ./runtier1.sh
  2. In another terminal run:

    DEBUG=.* ./runtier2.sh
  3. Then in another terminal run any substreams you want.

    substreams run map_eth_stats --plaintext -e localhost:9000 -s 1000 -t +1

    This will create 10 files on disk for store_eth_stats

  4. Manually delete any full kv

  5. Run again your substreams but with a different boundary (production mode or not)

    substreams run map_eth_stats --plaintext -e localhost:9000 -s 400 -t +1 --production-mode

    This will make the run wait and not complete as it is missing the full kv file that you delete

sduchesneau commented 1 year ago

This branch contains work in progress, but does not address how the squasher should fill in the holes and then restart normally... https://github.com/streamingfast/substreams/tree/feature/missing-ranges

Eduard-Voiculescu commented 1 year ago

Latest changes to feature/missing-ranges makes it that if you delete, let's say, the file 0-200.kv and then run (production mode or not)

substreams run map_eth_stats --plaintext -e localhost:9000 -s 200 -t +1 --production-mode

This will work. BUT if you run it with a start block not at the exact same height of the kv file deleted then it will create this kind of output locally:

0000000100-0000000000.54e06f8b1e0a145cf525ccf966e15bda.partial.zst
0000000100-0000000000.8a8a1fbabfe985ce9adad621dfa15da0.partial.zst
0000000100-0000000000.kv.zst
0000000200-0000000000.kv.zst
0000000200-0000000100.54e06f8b1e0a145cf525ccf966e15bda.partial.zst
0000000200-0000000100.8a8a1fbabfe985ce9adad621dfa15da0.partial.zst
0000000300-0000000000.kv.zst
0000000400-0000000000.kv.zst
0000000500-0000000000.kv.zst
0000000600-0000000000.kv.zst
0000000700-0000000000.kv.zst
0000000800-0000000000.kv.zst
0000000900-0000000000.kv.zst
0000001000-0000000000.kv.zst

To fix we would need to relaunch a squasher for that specific range, basically have a concept of store squashers by module AND by range...

sduchesneau commented 1 year ago

Will be addressed with rework: https://github.com/streamingfast/substreams/issues/226

sduchesneau commented 1 year ago

This issue has been fixed with #226 and deployed to our production endpoint.