streamingfast / merger

Apache License 2.0
4 stars 5 forks source link

On startup merger doesn't find holes (recent regression) #6

Closed matthewdarwin closed 2 years ago

matthewdarwin commented 2 years ago

binaries built before Dec 5 do not have this problem.

I have a BSC setup where mindreaders are generating one-blocks and merger is merging them. There are lots (10000+) blocks piled up to be merged, but there is a section where there are no merged blocks and no one-blocks. So merger should be waiting for the mindreader to generate those blocks.

The following ranges of merged blocks are not available: 12890800 => 12899900 (92); 12900000 => 12964100 We're waiting for mindreader to generate one-block files for 12890800 and on. So merger should be stuck waiting for those files to appear.

This is how it looks on a binary built before Dec 5, all ok.

2021-12-20T18:44:29.257Z (merger) deleting files that are too old or already seen (merger/merger.go:186){"number_of_files": 31, "first_file": "0009564899-20210729T034652.0-3636f5b4-12c76538-dom31", "last_file": "0012775440-20211119T095016.0-598718ca-a5ebdb7a-dom32"}
2021-12-20T18:44:29.259Z (merger) waiting for more files to complete bundle (merger/merger.go:385){"bundle_lowerblock": 12890800, "bundle_length": 0, "bundle_upper_block_id": ""}
2021-12-20T18:44:37.444Z [WARN]  agent: Check is now critical: check=service:bsc-dfuse-mrg-v1
2021-12-20T18:44:39.262Z (s3store) got an error on s3 OpenObject, retrying (dstore/s3store.go:237){"error": "NoSuchKey: \n\tstatus code: 404, request id: tx0000058f8835f631139f6-0061c0cf17-59deac0-default, host id: ", "attempt": 1, "max_attempts": 3, "name": "0012890800", "path": "0012890800.dbin.zst"}
2021-12-20T18:44:39.765Z (s3store) got an error on s3 OpenObject, retrying (dstore/s3store.go:237){"error": "NoSuchKey: \n\tstatus code: 404, request id: tx000006663919e0c4da2f0-0061c0cf17-59deac0-default, host id: ", "attempt": 2, "max_attempts": 3, "name": "0012890800", "path": "0012890800.dbin.zst"}
2021-12-20T18:44:40.638Z (merger) retrieved list of files (merger/merger.go:451){"seenblock_low_boundary": 12880799, "bundle_lower_block": 12890800, "seen_files_count": 0, "too_old_files_count": 38, "good_files_count": 2000, "canonical_good_files_count": 2000}

On latest built, it skips over this hole and is is looking at block 13173400

2021-12-19T00:47:21.408Z (api) registering development exporters from environment variables (dtracing/api.go:139)
2021-12-19T00:47:21.408Z (dfuse) ulimit max open files before adjustment (launcher/setup.go:60){"current_value": 1000000}
2021-12-19T00:47:21.408Z (dfuse) no need to update ulimit as it's already higher than our good enough value (launcher/setup.go:62){"good_enough_value": 1000000}
2021-12-19T00:47:21.408Z (dfuse) sfeth binary started (cli/start.go:47){"data_dir": "/var/lib/dfuse"}
2021-12-19T00:47:21.408Z (dfuse) Starting with config file '/etc/dfuse/config.yml' (cli/start.go:50)
2021-12-19T00:47:21.408Z (dfuse) starting atomic level switcher (launcher/logging.go:130){"listen_addr": "localhost:1065"}
2021-12-19T00:47:21.408Z (dfuse) launcher created (cli/start.go:117)
2021-12-19T00:47:21.408Z (dfuse) Launching applications: merger (cli/start.go:124)
2021-12-19T00:47:21.408Z (dfuse) initialize application (launcher/launcher.go:71){"app": "merger"}
2021-12-19T00:47:21.408Z (dfuse) creating directory and its parent(s) (cli/utils.go:41){"directory": "/var/lib/dfuse/merger/merger.seen.gob"}
2021-12-19T00:47:21.408Z (dfuse) creating application (launcher/launcher.go:83){"app": "merger"}
2021-12-19T00:47:21.408Z (dfuse) app status switching to warning (launcher/launcher.go:249){"app_id": "merger"}
2021-12-19T00:47:21.408Z (dfuse) launching app (launcher/launcher.go:110){"app": "merger"}
2021-12-19T00:47:21.409Z (merger) failed to load bundle  (merger/app.go:88){"file_name": "/var/lib/dfuse/merger/merger.seen.gob"}
2021-12-19T00:47:21.525Z (merger) new bundler (bundle/bundler.go:25){"bundle_size": 100, "first_exclusive_highest_block_limit": 13173400}
2021-12-19T00:47:21.525Z (merger) fetching one block files (bundle/bundler.go:64){"at_low_block_num": 13173300}
2021-12-19T00:47:22.763Z (merger) Processed, already existing merged file (merger/merger_io.go:244){"lower_block": 0, "highest_block": 13173399}
2021-12-19T00:47:22.763Z (merger) found lib to reach (bundle/bundler.go:73){"lib_block_num": 18446744073709551615}
2021-12-19T00:47:22.763Z (merger) processed one block files (bundle/bundler.go:83){"at_low_block_num": 13173300, "lib_num_to_reach": 13173199}
2021-12-19T00:47:22.763Z (merger) fetching one block files (bundle/bundler.go:64){"at_low_block_num": 13173200}
2021-12-19T00:47:23.924Z (merger) Processed, already existing merged file (merger/merger_io.go:244){"lower_block": 0, "highest_block": 13173299}
2021-12-19T00:47:23.924Z (merger) processed one block files (bundle/bundler.go:83){"at_low_block_num": 13173200, "lib_num_to_reach": 13173199}
2021-12-19T00:47:23.924Z (merger) fetching one block files (bundle/bundler.go:64){"at_low_block_num": 13173100}
2021-12-19T00:47:25.083Z (merger) Processed, already existing merged file (merger/merger_io.go:244){"lower_block": 0, "highest_block": 13173199}
2021-12-19T00:47:25.083Z (merger) merger initiated (merger/app.go:130)
2021-12-19T00:47:25.084Z (merger) merger running (merger/app.go:144)
2021-12-19T00:47:25.084Z (merger) starting merger (merger/merger.go:73){"bundle": "bundle_size: 100, last_merge_block_num: 0, inclusive_lower_block_num: 13173300, exclusive_highest_block_limit: 13173400"}
2021-12-19T00:47:25.084Z (merger) grpc server created (merger/server.go:20)
2021-12-19T00:47:25.084Z (merger) tcp listener created (merger/server.go:27)
2021-12-19T00:47:25.084Z (merger) server registered (merger/server.go:31)
2021-12-19T00:47:25.084Z (merger) listening & serving grpc content (merger/server.go:34){"grpc_listen_addr": ":9001"}
2021-12-19T00:47:26.085Z (dfuse) app status switching to running (launcher/launcher.go:243){"app_id": "merger"}
2021-12-19T00:47:26.268Z (merger) Processed, already existing merged file (merger/merger_io.go:244){"lower_block": 0, "highest_block": 13173399}
2021-12-19T00:49:17.095Z (merger) retrieved list of files (merger/merger_io.go:101){"files_count": 2000}
2021-12-19T00:49:17.122Z (merger) retrieved list of files (merger/merger.go:179){"too_old_files_count": 2000, "added_files_count": 0}
2021-12-19T00:49:17.123Z (merger) deleting files that are too old or already seen (merger/merger_io.go:165){"number_of_files": 2000, "first_file": "0013110879-20211201T192435.0-895a7ec6-2faac27f-dom32", "last_file": "0013112878-20211201T210457.0-be591aae-8a007a89-dom32"}

No end of playing with merger-start-block-num will make the latest merger work properly. /var/lib/dfuse/merger/merger.seen.gob was removed as well.

Working config with the old binary:

start:
  args:
  - merger
  flags:
    config-file: ""
    log-format: text
    log-to-file: false
    common-chain-id: 56
    common-network-id: none
    common-blocks-store-url: s3://ceph/....
    common-oneblock-store-url: s3://ceph/....
    common-blockmeta-addr: dns:///bsc-dfuse-bm-v1
    merger-grpc-listen-addr: :9001
    merger-batch-mode: false
    merger-start-block-num: 12890600 
    merger-stop-block-num: 0
billettc commented 2 years ago

Some code have been merged into the develop branch of sf-ethereum and now that branch is unstable...

The following flags should have been remove from the merger cmd.

start-block end-block merger-minimal-block-num merger-max-fixable-fork merger-batch-mode

merger-next-exclusive-highest-block-limit should have been added

Merger is no more supporting batch mode since merge files that need to be reprocessed should be generated be the mindreader

We are currently working to fix this situation. Some other components like mindreader relayer fireshose are also impacted.

sduchesneau commented 2 years ago

Hi, here's a bit of reading on the mechanism of the merger:

Merger mechanism from startup (what is the merger trying to do on boot)

  1. merger wants to figure out the next bundle to merge (either from the state file, or by scanning existing merged bundles)

  2. merger reads the last few merged files to "prepopulate its knowledge of the current chain"

  3. merger polls the new one-block-files (its usual job) -> files below Last Irreversible Block and below last merged bundle will be deleted -> files above Last Irreversible Block but not merged already will be included in next bundle -> files that fit inside the next bundle will be merged once a complete canonical chain can be linked from LIB to the highest block

  4. merger also scans to see if the bundle it wants to create is already present, in which case it loads it (skips bundle production) and continues on to the next

Example Scenarios

1. merger start with [merged blocks 0000,0100,...,0900], [one-block-files 1000-> 1140], no holes, no state file

  1. merger scans merged files starts evaluating bundle 1000
  2. merger reads merged files 700, 800 and 900 (starting at the highest LIB num found in merged file 900) loads those blocks in memory
  3. merger scans one-block-files, evaluates the longest chain from its known LIB to block 1100 and create new bundle containing blocks 1000->1099, then moves its LIB to the LIB of block 1100.
  4. merger keeps scanning one-block-files from 1100 to 1140, etc., until it completes a chain segment from the LIB to 1200, etc., deleting the old merged one-block-files as it goes.

2. merger starts, with holes in merged files [merged blocks 0200,0800,1300] and one-block-files being produced [1400,1401,1402...]

  1. merger scans merged files, will either: a) detect properly that it needs to start at 1400 (then you're lucky) b) detect incorrectly the start block and starts c) detects a hole and fail

^ this is not a wanted scenario...

If you want to start the merger anyway to start producing merged bundles at block 1400, do the following: a) ensure that you have a mindreader producing block 1400 and the following b) delete any one-block-file that is below 1400 (to prevent them being put in the next bundle...) c) start the merger with the flag --merger-next-exclusive-highest-block-limit=1500 (remove that flag next time, the state file will be sufficient to continue)

Then, the merger will produce the merged files 1400, 1500, etc.

Note that you will need to create the missing merged files (100, 300, 400, etc.) using the mindreader directly with the appropriate flags, don't use the merger for that purpose.

sduchesneau commented 2 years ago

I will write a small procedure for you to fix those holes tomorrow.

matthewdarwin commented 2 years ago

In my case, using the old merger merged all the files, so I don't have any holes now. Now if all the merged blocks are correct, I don't know.

matthewdarwin commented 2 years ago

When filling in old blocks, I don't like running mindreader in auto-merge mode because if the mindreader process exits, then I end up with a hole because it can't create a full block bundle.

sduchesneau commented 2 years ago

ok, then make sure that you write those 1-block-files somewhere else than where your real merger is running :) This being said, and the new behavior being written here, do we still need this issue ?

matthewdarwin commented 2 years ago

With the latest mindreader with the merger code embedded in it, I got a hole in the local one-block files. I sent the logs to Matthieu

sduchesneau commented 2 years ago

I believe this is unrelated to the issue ->

Mar  8 11:13:09 goerli (codec) mindreader block stats {"block_num": 6014239, "duration": 19020325, "stats": {"add_log":11,"balance_change":14,"begin_apply_trx":5,"code_change":6,"created_account":6,"end_apply_trx":5,"evm_end_call":51,"evm_keccak":48,"evm_param":51,"evm_run_call":51,"finalize_block":1,"gas_change":196,"gas_event":92,"nonce_change":16,"storage_change":50,"suicide_change":1,"trx_from":5}}
Mar  8 11:13:09 goerli (mindreader) got one block {"block_num": 6014239}
Mar  8 11:13:09 goerli (mindreader) waiting to consume next block
Mar  8 11:13:09 goerli (mindreader) waiting to consume next block
Mar  8 11:13:09 goerli (codec) mindreader block stats {"block_num": 6014242, "duration": 1606321, "stats": {"add_log":6,"balance_change":9,"begin_apply_trx":3,"end_apply_trx":3,"evm_end_call":10,"evm_keccak":40,"evm_param":10,"evm_run_call":10,"finalize_block":1,"gas_change":34,"gas_event":14,"nonce_change":3,"storage_change":13,"trx_from":3}}

the mindreader completely skipped producing this block. Maybe the deep-mind patch has an issue with these specific blocks in Görli ? We are not indexing this testnet so we are not aware of any issue..

could you bring your görli node up to block 6014238 (or below), then start it with --mindreader-node-debug-deep-mind up to block 6014242 (or above) ?

It will produce lots of logs but may give us an insight on what happened, and if those blocks are completely unprocessable with the current deepmind/mindreader combination ?

(Also, since this is unrelated, would you mind if I close this issue and open a new one for this ?)

sduchesneau commented 2 years ago

stale issue, if the deepmind+mindreader skips some things in görli, it is unrelated to merger.