streamingfast / firehose-ethereum

Ethereum on StreamingFast
Apache License 2.0
77 stars 24 forks source link

Firehose dies with cannot link new blocks to chain after a reconnection #49

Open let4be opened 1 year ago

let4be commented 1 year ago

Network: BSC Test net Firehose version: v1.3.1 Geth: bsc-v1.1.18-fh2.1-2

Steps to repeat: start syncing, sync some blocks and press ctrl-c or terminate process abruptly(I'm running merger, firehose, reader-node, relayer, combined-index-builder) all in one process. Restart ./fireeth -c config.yaml start(maybe several times) and watch it die with cannot link new blocks to chain after a reconnection sometimes it also enters infinite loop spamming console with millions of 2023-01-18T09:41:07.186+0400 INFO (reader.sub.relayer) not pushing block to a closed subscription

I think firehose should be able to restart gracefully even after abrupt termination(and it fails to do so even when doing graceful shutdown via ctrl+c)

edit: on yet another restart with the same state it started to complain on this 2023-01-18T09:43:46.926+0400 WARN (bstream) too many consecutive unlinkable blocks {"block": "#136834 (06ce4bb2b6c54d9c)", "consecutive_unlinkable_blocks": 24300, "last_block_sent": "#112408 (444c547f0115596d)"}

in the result it seems that local state is corrupted and I cannot continue syncing data... is there any way to repair it?

config file

start:
  args:
  # Define Firehose components that will be used for this setup
  - merger
  - firehose
  - reader-node
  - relayer
  - combined-index-builder
  flags:
    # Prevent creation of file for logging (logs from previous sessions are not visible with this setting)
    log-to-file: false

    # Ethereum is the default chain, its id is 1. Update these flags for different chains such as Binance
    common-chain-id: "97"
    common-network-id: "97"
    # Update the reader-node-path to reference the geth binary for the chain and OS being targeted
    reader-node-path: ../geth/geth_linux
    # Update the reader-code-arguments for the chain being targeted
    # Find specific values in the Firehose Ethereum Setup Documentation
    reader-node-arguments: "+--maxpeers 500 --cache 4096 --snapshot=false  --bootnodes=enode://69a90b35164ef862185d9f4d2c5eff79b92acd1360574c0edf36044055dc766d87285a820233ae5700e11c9ba06ce1cf23c1c68a4556121109776ce2a3990bba@52.199.214.252:30311,enode://330d768f6de90e7825f0ea6fe59611ce9d50712e73547306846a9304663f9912bf1611037f7f90f21606242ded7fb476c7285cb7cd792836b8c0c5ef0365855c@18.181.52.189:30311,enode://df1e8eb59e42cad3c4551b2a53e31a7e55a2fdde1287babd1e94b0836550b489ba16c40932e4dacb16cba346bd442c432265a299c4aca63ee7bb0f832b9f45eb@52.51.80.128:30311,enode://0bd566a7fd136ecd19414a601bfdc530d5de161e3014033951dd603e72b1a8959eb5b70b06c87a5a75cbf45e4055c387d2a842bd6b1bd8b5041b3a61bab615cf@34.242.33.165:30311,enode://604ed87d813c2b884ff1dc3095afeab18331f3cc361e8fb604159e844905dfa6e4c627306231d48f46a2edceffcd069264a89d99cdbf861a04e8d3d8d7282e8a@3.209.122.123:30311,enode://4d358eca87c44230a49ceaca123c89c7e77232aeae745c3a4917e607b909c4a14034b3a742960a378c3f250e0e67391276e80c7beee7770071e13f33a5b0606a@52.72.123.113:30311"
    reader-node-log-to-zap: false
    reader-node-bootstrap-data-url: ../geth/testnet/genesis.json

    firehose-grpc-listen-addr: ":9000"
    substreams-enabled: true
    substreams-sub-request-parallel-jobs: 10
    substreams-partial-mode-enabled: true
    substreams-rpc-endpoints: "$BSC_SUBSTREAMS_RPC_ENDPOINT"
    substreams-sub-request-block-range-size: 100
    substreams-stores-save-interval: 100

example log:https://pastebin.com/w3Rn2k8r

nasbasarin commented 1 year ago

I am facing same issue

matthewdarwin commented 1 year ago

Maybe upgrade to latest?

joeypunzel commented 1 year ago

I'm having the same issue with firehose-eth

In my case, running sync for some time, up to about 1M block, then it suddenly fails but unclear why.

2023-04-03T17:58:54.814Z INFO (reader) reader read statistics {"block_rate": "0 blocks/min (0 total)", "trx_rate": "0 trxs/min (0 total)", "last_block": "Block <empty>", "block_average_parse_time": "0ms/block (over 1min)"}
INFO [04-03|17:58:55.418] Downloader queue stats                   receiptTasks=0 blockTasks=362 itemSize=231.00B throttle=8192
INFO [04-03|17:58:55.490] Imported new chain segment               blocks=1 txs=7 mgas=0.187 elapsed=72.482ms mgasps=2.580 number=956,745 hash=6e5ab3..d27e48 age=7y3mo4d dirty=30.65KiB
INFO [04-03|17:58:55.490] Unindexed transactions                   blocks=1 txs=0 tail=955,746 elapsed="112.742µs"
2023-04-03T17:58:55.491Z INFO (relayer.src.13010) realtime gate passed {"delta": "62745h1m34.491879003s"}
2023-04-03T17:58:55.491Z INFO (bstream) loading blocks in ForkableHub from one-block-files {"start_block": 956500, "head_block": "#956745 (6e5ab310d4cf79b38f086b029ccc69589974d6bb07f82afd1117d7ed25d27e48)"}
INFO [04-03|17:58:55.894] Imported new chain segment               blocks=21 txs=638 mgas=14.810 elapsed=253.664ms   mgasps=58.385 number=956,766 hash=e18bf6..d26fab age=7y3mo4d dirty=1.17MiB
INFO [04-03|17:58:55.895] Unindexed transactions                   blocks=21 txs=25  tail=955,767 elapsed="703.993µs"
INFO [04-03|17:58:56.662] Imported new chain segment               blocks=172 txs=595 mgas=24.863 elapsed=764.909ms   mgasps=32.505 number=956,938 hash=1af45a..29ddbe age=7y3mo4d dirty=2.80MiB
INFO [04-03|17:58:56.697] Unindexed transactions                   blocks=172 txs=342 tail=955,939 elapsed=34.972ms
2023-04-03T17:58:56.844Z INFO (index-builder) reading from blocks store: file does not (yet?) exist, retrying in {"filename": "/mnt/ebs/firehome/sf-data/storage/merged-blocks/0000956500.dbin.zst", "base_filename": "0000956500", "retry_delay": "4s"}
2023-04-03T17:58:56.888Z INFO (reader.sub.relayer) reach max buffer size for subcription, closing channel {"capacity": 200}
2023-04-03T17:58:56.890Z INFO (reader.sub.relayer) not pushing block to a closed subscription
2023-04-03T17:58:56.896Z INFO (reader.sub.relayer) not pushing block to a closed subscription
2023-04-03T17:58:56.901Z INFO (reader.sub.relayer) not pushing block to a closed subscription
2023-04-03T17:58:56.904Z INFO (reader.sub.relayer) not pushing block to a closed subscription
.... (repeating logs)....
2023-04-03T17:58:58.635Z INFO (reader.sub.relayer) not pushing block to a closed subscription
2023-04-03T17:58:58.637Z INFO (reader) unsubscribed {"new_length": 0}
2023-04-03T17:58:58.644Z INFO (relayer.src.13010) source shutting down {"endpoint_url": ":13010", "error": "EOF"}
2023-04-03T17:58:58.837Z INFO (relayer) new source factory created
2023-04-03T17:58:58.838Z INFO (relayer.src.13010) starting block source consumption {"endpoint_url": ":13010"}
2023-04-03T17:58:58.838Z INFO (relayer.src.13010) block stream source reading messages {"endpoint_url": ":13010"}
2023-04-03T17:58:58.839Z INFO (reader.sub.relayer) receive block request {"trace_id": "7770c4af8f65cd39e1a028b7ebc439a0", "request": {"requester":"relayer"}}
2023-04-03T17:58:58.839Z INFO (reader.sub.relayer) sending burst {"busrt_size": 0}
2023-04-03T17:58:58.839Z INFO (reader) subscribed {"new_length": 1, "subscriber": "relayer"}
2023-04-03T17:58:58.852Z INFO (relayer.src.13010) realtime gate passed {"delta": "62741h53m19.852322575s"}
2023-04-03T17:58:58.866Z WARN (bstream) too many consecutive unlinkable blocks {"block": "#957418 (db0c8e4907f8a2ffd67b51a0279a14391f7d244157daa07094b3c59f27e9e52e)"}
2023-04-03T17:58:58.866Z ERRO (relayer) unable to process block, shutting down source
2023-04-03T17:58:58.866Z INFO (bstream) reconnecting hub after disconnection. expecting to reconnect and get blocks linking to headnum within delay {"delay": "1m0s", "current_head_block_num": 956954, "error": "too many consecutive unlinkable blocks"}
....
2023-04-03T17:58:58.869Z INFO (relayer) closing block stream server
2023-04-03T17:58:58.870Z ERRO (fireeth)
################################################################
Fatal error in app relayer:

cannot link new blocks to chain after a reconnection
###############################################################
...
2023-04-03T17:58:59.553Z ERRO (derr) dfuse {"error": "cannot link new blocks to chain after a reconnection"}
....

Setup my firehose-eth on AWS using this documentation: https://github.com/streamingfast/firehose-docs/blob/master/firehose-setup/ethereum/local-deployment.md

here are my versions:

[ec2-user@ip-10-0-21-115 firehome]$ ./geth_linux version
INFO [04-03|18:03:13.902] Initializing firehose
INFO [04-03|18:03:13.909] Firehose initialized                     enabled=false sync_instrumentation_enabled=true mining_enabled=false block_progress_enabled=false compaction_disabled=false archive_blocks_to_keep=0 genesis_provenance="Geth Default" firehose_version=2.0 geth_version=1.10.23-fh2 chain_variant=geth
Geth
Version: 1.10.23-fh2
Git Commit: e7f3686b2f8955a0824140c98f88c4e3c94f9741
Architecture: amd64
Go Version: go1.17.12
Operating System: linux
GOPATH=
GOROOT=go

[ec2-user@ip-10-0-21-115 firehome]$ ./fireeth --version
fireeth version refs/tags/v1.0.0 (Commit 42f870c, Built 2022-09-02T17:05Z)/

not really sure what this error is and how to properly handle it. if I try simply restarting the process, it usually fails with same error but about 1/5 times it does not fail and is able to pickup where it left off and continue processing. but in these cases it fails with same error about 30min later

any idea how I can address this issue ?

maoueh commented 1 year ago

Versions you are using for fireeth and Firehose instrumented geth are outdated, please use the latest version for both (so https://github.com/streamingfast/firehose-ethereum/releases/tag/v1.3.7 for fireeth and https://github.com/streamingfast/go-ethereum/releases/tag/geth-v1.11.5-fh2.2 for Firehose instrumented geth).

Note that a Beacon Client is required for Ethereum Mainnet/Goerli to sync properly.

Please retry with latest version and let us know how it goes.

joeypunzel commented 1 year ago

Looking better now @maoueh after updating latest versions, syncing the beacon client, and getting the fireeth instance to sync with the beacon client. Up to 1.4M blocks and counting. Thanks again for your help here and on Discord @maoueh :)

winchell commented 1 year ago
fireeth version 1.4.10 (Commit 5b548fb, Built 2023-07-27T17:49:35Z)
geth_linux version 1.12.0-fh2-dd3c6c2b

Ctrl-C during chain sync always corrupts (with "cannot link new blocks to chain after a reconnection" and "too many consecutive unlinkable blocks" on next run) and I have to purge and start over. Not expected because the docs say:

Tip: To terminate the Firehose processing and connection to the Ethereum network press the Control + C keys.

The Firehose sync process will shut down gracefully and continue where it left off upon the next restart. 
matthewdarwin commented 1 year ago

Why firehose process are you terminating? Only the reader? What is your config?

winchell commented 1 year ago
./fireeth -c config.yaml start
start:
  args:
  - merger
  - firehose
  - reader-node
  - relayer
  - combined-index-builder
  flags:
    log-to-file: false
    common-chain-id: "5"
    common-network-id: "5"
    reader-node-path: /usr/local/bin/geth_linux
    reader-node-arguments: "+--goerli --datadir /home/ethereum-goerli-firehose --cache 8192 --maxpeers 100 --metrics --metrics.port 6061 --port=30303 --http.port=8545 --ws.port=8546 --snapshot=true --txlookuplimit=1000"
    reader-node-log-to-zap: false
maoueh commented 1 year ago

@abourget for planning