[Bug] Panic and crashloop

graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL

https://thegraph.com

Apache License 2.0

2.89k stars 962 forks source link

[Bug] Panic and crashloop #5234

Open paymog opened 6 months ago

paymog commented 6 months ago

Bug report

One of our indexers suddenly started crashing with the following. We're not sure why.

Relevant log output

thread 'tokio-runtime-worker' panicked at 'failed to parse mappings: Bad magic number (at offset 0)', chain/ethereum/src/capabilities.rs:62:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Feb 22 16:50:34.867 INFO Data source count at start: 2, sgd: 42628, subgraph_id: QmQ4pbd8UFcipKr5z3cYukp6kCj6pYRDYmQv2JYrcLQDBo, component: SubgraphInstanceManager
Panic in tokio task, aborting!

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

[ ] Tick this box if this bug is caused by a regression found in the latest release.
[ ] Tick this box if this bug is specific to the hosted service.
[X] I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

paymog commented 6 months ago

Added RUST_BACKTRACE=1 and now I see the following (the logs are actually messier and I tried to clean them up)

tokio-runtime-worker' panicked at 'failed to parse mappings: Bad magic number (at offset 0), ', sgdchain/ethereum/src/capabilities.rs: :4262862, backtrace:

   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: graph_core::subgraph::instance_manager::SubgraphInstanceManager<S>::build_subgraph_runner::{{closure}}
   4: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
   5: tokio::runtime::task::core::Feb 22 16:57:38.576Core< TDEBG, S>tokio::runtime::task::raw::poll
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   8: tokio::runtime::scheduler::multi_thread::worker::run
   9: tokio::runtime::task::raw::poll
  10: tokio::runtime::task::UnownedTask<S>::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Panic in tokio task, aborting!

azf20 commented 6 months ago

hey @paymog is this specific to this subgraph QmQ4pbd8UFcipKr5z3cYukp6kCj6pYRDYmQv2JYrcLQDBo or do you see this on multiple deployments?

paymog commented 6 months ago

turns out this happened because an unbuilt subgraph was deployed into our infra. We mitigated by removing the invalid subgraph. It would be good if graph node could just kill the particular instance manager thread instead of the whole process.

I can't quite remember which ipfs hash was causing the issue - it may have been that one or a different one.

lutter commented 6 months ago

What does 'unbuilt subgraph' mean here? Does that mean AssemblyScript was deployed instead of WASM blobs?

paymog commented 6 months ago

Yup! The subgraph was uploaded without first running graph build so it was assembly script .ts files) and not wasm

lutter commented 6 months ago

Wild. Is there maybe a bug in graph-cli that deployed assembly script sources? Also, I don't recognize the sgdchain/ethereum/src/capabilities.rs file name in the graph-node sources. Is this crashloop happening in vanilla graph-node?

paymog commented 6 months ago

The subgraph wasn't uploaded using the graph cli, it was uploaded using a customer CLI tool. Whoops, I probably didn't clean up the logs perfect, I think the sgd prefix on the path is incorrect. I think the right path is chain/ethereum/src/capabilities.rs. Yup, the crash loop is happening in vanilla v0.34.0

github-actions[bot] commented 3 days ago

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.