graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.85k stars 930 forks source link

[Bug] RPC poller type firehose crash loops the graph-node when syncing subgraphs with trace #5166

Open trader-payne opened 5 months ago

trader-payne commented 5 months ago

Bug report

  1. Be connected to an RPC poller type firehose without trace support
  2. Try to sync subgraphs that are using trace filters
  3. Graph-node crash loops

Expected behavior: should handle those cases gently and not kill itself after it fails.

[chains.${CHAIN_0_NAME}]
shard = "primary"
provider = [
{ label = "firehose", details = { type = "firehose", url = "${CHAIN_0_URL}", features = [ "compression" ], token = "${CHAIN_0_TOKEN}", conn_pool_size = 1 } }
]

Relevant log output

Jan 27 13:52:03.875 INFO Graph Node version: 0.34.0 (2024-01-26)
Jan 27 13:52:03.876 WARN GRAPH_POI_ACCESS_TOKEN not set; might leak POIs to the public via GraphQL
Jan 27 13:52:03.876 INFO Reading configuration file `/root/graph-node-configs/config.toml`
Jan 27 13:52:03.876 WARN No fork base URL specified, subgraph forking is disabled
Jan 27 13:52:03.876 INFO Starting up
Jan 27 13:52:03.876 INFO Trying IPFS node at: https://ipfs.network.thegraph.com/ipfs/
Jan 27 13:52:03.889 DEBG Creating firehose networks [1 chains, ingestor index_node_0]
Jan 27 13:52:03.889 INFO Configuring firehose endpoint, provider: firehose
Jan 27 13:52:03.889 DEBG Creating firehose networks [1 chains, ingestor index_node_0]
Jan 27 13:52:03.889 WARN Expensive queries file not set to a valid file: /etc/graph-node/expensive-queries.txt
Jan 27 13:52:03.889 DEBG Cleaning up large notifications after about 300s, channel: store_events, component: NotificationListener
Jan 27 13:52:03.902 INFO Connecting to Postgres, weight: 1, conn_pool_size: 10, url: postgresql://your_db_user:your_db_password@:5432/your_graphnode_db_name?host=/var/run/postgresql, pool: main, shard: primary
Jan 27 13:52:03.903 INFO Pool successfully connected to Postgres, pool: main, shard: primary, component: Store
Jan 27 13:52:03.903 DEBG Using postgres host order [Main], shard: primary, component: Store
Jan 27 13:52:03.906 INFO Setting up fdw, pool: main, shard: primary, component: ConnectionPool
Jan 27 13:52:03.910 INFO Running migrations, pool: main, shard: primary, component: ConnectionPool
Jan 27 13:52:03.911 INFO Migrations finished, pool: main, shard: primary, component: ConnectionPool
Jan 27 13:52:03.911 DEBG Release migration lock, pool: main, shard: primary, component: ConnectionPool
Jan 27 13:52:03.912 DEBG Setup finished, setup_time_s: 0, pool: main, shard: primary, component: ConnectionPool
Jan 27 13:52:03.912 DEBG Cleaning up large notifications after about 300s, channel: chain_head_updates, component: ChainHeadUpdateListener > NotificationListener
Jan 27 13:52:03.932 INFO Connecting to Firehose to get chain identifier, provider: firehose, provider: firehose
Jan 27 13:52:03.932 INFO Requesting genesis block from firehose, provider: firehose
Jan 27 13:52:03.932 DEBG Connecting to firehose to retrieve block for number 0, provider: firehose
Jan 27 13:52:04.100 INFO Successfully connected to IPFS node at: https://ipfs.network.thegraph.com/ipfs/
Jan 27 13:52:04.424 DEBG Retrieving block(s) from firehose, provider: firehose
Jan 27 13:52:04.424 INFO Connected to Firehose, genesis_block: #0 (f1999522da42d33a241a58a8e0fdfc864c9f9acca3374bad7adb255f31d92b55), provider: firehose, provider: firehose
Jan 27 13:52:04.426 INFO Creating LoadManager in disabled mode, component: LoadManager
Jan 27 13:52:04.427 INFO Started block ingestor, network_name: astar-testnet
Jan 27 13:52:04.427 INFO Starting block ingestor for network, network_name: astar-testnet
Jan 27 13:52:04.427 INFO Starting job runner with 5 jobs, component: JobRunner
Jan 27 13:52:04.427 DEBG Schedule for Vacuum subgraphs.subgraph_deployment, first_run_in_s: 0, interval_s: 60, component: JobRunner
Jan 27 13:52:04.427 DEBG Schedule for Report pg_notification_queue_usage(), first_run_in_s: 0, interval_s: 60, component: JobRunner
Jan 27 13:52:04.427 DEBG Schedule for Reconcile important tables from the primary, first_run_in_s: 9, interval_s: 900, component: JobRunner
Jan 27 13:52:04.427 DEBG Schedule for Record and remove unused deployments, first_run_in_s: 79, interval_s: 7200, component: JobRunner
Jan 27 13:52:04.427 DEBG Schedule for Refresh materialized views, first_run_in_s: 237, interval_s: 21600, component: JobRunner
Jan 27 13:52:04.429 INFO Starting JSON-RPC admin server at: http://localhost:8020, component: JsonRpcServer
Jan 27 13:52:04.429 INFO Starting GraphQL HTTP server at: http://localhost:8000, component: GraphQLServer
Jan 27 13:52:04.429 INFO Starting index node server at: http://localhost:8030, component: IndexNodeServer
Jan 27 13:52:04.429 INFO Starting metrics server at: http://localhost:8040, component: MetricsServer
Jan 27 13:52:04.429 INFO Starting GraphQL WebSocket server at: ws://localhost:8001, component: SubscriptionServer
Jan 27 13:52:04.430 INFO Trying to reconnect the Blockstream after disconnect, cursor: MTNgQADSfPpXm57wA-f5TqWwLpcyAllnUgLmIBFH0Yj-oXvE2JyjUzQmORqFxPj23B3sQ1yt3t7LQit99JEButDqy75tuyE6Rn4oltjmqrS-cfD2MAMadbM0WrmEZtvTXjXUaAipfONR6YfnPPTabkM2McMhJG6z2m5Vo4BVefIU6ycwlzr7cc3Rg_KWoIFBreInRbagnHjyBjUpJEcObM2FZKQ=, endpoint uri: firehose, network_name: astar-testnet, provider: firehose, component: EthereumFirehoseBlockIngestor
Jan 27 13:52:04.736 DEBG Subgraph started, start_ms: 307, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphRegistrar
Jan 27 13:52:04.739 INFO Resolve subgraph files using IPFS, n_templates: 1, n_data_sources: 2, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:04.800 DEBG Subgraph started, start_ms: 370, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphRegistrar
Jan 27 13:52:04.800 INFO Started all assigned subgraphs, node_id: index_node_0, count: 2, component: SubgraphRegistrar
Jan 27 13:52:04.803 INFO Resolve subgraph files using IPFS, n_templates: 0, n_data_sources: 3, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.699 INFO Successfully resolved subgraph files using IPFS, features: ø, n_templates: 0, n_data_sources: 3, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.723 INFO Starting subgraph writer, queue_size: 5, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.738 INFO Data source count at start: 3, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.758 INFO Successfully resolved subgraph files using IPFS, features: nonFatalErrors, n_templates: 1, n_data_sources: 2, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:05.765 INFO Starting subgraph writer, queue_size: 5, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:05.777 INFO Data source count at start: 2, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:05.778 DEBG Starting or restarting subgraph, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.779 DEBG Starting block stream, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.779 INFO Blockstream disconnected, connecting, provider_err_count: 0, cursor: c2189eNZNyWNIaZ0jdxXgaWwLpc_DF1tVAvmIBcQhY_z9XHNjsj3U2d8OUzTkv3y3RW_GVyt29zNRnct9pFZtdfixrpn6ic_Tn4rm4jnreXmLfr0MAMfee5kXL-EZdraWz_dZFv7frQF4920b6KKYUtjM8QkeGSzj2sCpIRRc6NDsCUxkjX0dcfX0viR94VJ-uUtRe33xCL2DjN0JUtSaMqBMg==, subgraph: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, start_block: 72104, endpoint_uri: firehose, provider: firehose, deployment: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: FirehoseBlockStream
Jan 27 13:52:05.822 DEBG Starting or restarting subgraph, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:05.822 DEBG Checking if subgraph current block is after manifest start block, manifest_start_block_number: 24771, subgraph_current_block_number: 24770, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: FirehoseBlockStream
Jan 27 13:52:05.822 DEBG Starting block stream, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: SubgraphInstanceManager
Jan 27 13:52:05.822 INFO Blockstream disconnected, connecting, provider_err_count: 0, cursor: , subgraph: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, start_block: 24771, endpoint_uri: firehose, provider: firehose, deployment: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, sgd: 1, subgraph_id: QmdcUWvTS15JBzUpaQtt1Tst5rAzAVL9sPXyL5uXANegq8, component: FirehoseBlockStream
Jan 27 13:52:05.896 INFO Blockstream connectedthread 'tokio-runtime-worker, ' panicked at 'Got a transaction trace with status UNKNOWN, datasource is brokenprovider', : chain/ethereum/src/codec.rsfirehose:, 163deployment:: 17QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa
, note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: FirehoseBlockStream
Panic in tokio task, aborting!
Jan 27 13:52:05.897 INFO Stopping subgraph, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.898 DEBG Subgraph stopped, WASM runtime thread terminated, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager
Jan 27 13:52:05.900 DEBG Subgraph stopped, WASM runtime thread terminated, sgd: 2, subgraph_id: QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa, component: SubgraphInstanceManager

IPFS hash

QmdNJTN49nAx6hoiFGQi1v1vQ8fqyjHp4LVcBHcgMUVwSa

Subgraph name or link to explorer

No response

Some information to help us out

OS information

Linux

sduchesneau commented 5 months ago

This issue has been linked to a bug that caused missing status or missing receipts in blockss. https://github.com/streamingfast/firehose-ethereum/issues/84 https://github.com/streamingfast/firehose-ethereum/issues/85

Fixes are in firehose release v2.3.1, will require reprocessing the firehose blocks for the affected chains.

matthewdarwin commented 5 months ago

Is there anything else we need to do for this issue, or are those to changes to firehose-ethereum the full fix?