Open DDeAlmeida opened 4 months ago
Another one while syncing
2024-07-03T07:32:45.898017Z INFO stats: #119240171 Downloading blocks 1.65% (63481 left; at 119240171) 69 peers ⬇ 1.37 MB/s ⬆ 206 kB/s 0.00 bps 0 gas/s CPU: 220%, Mem: 13.1 GB
2024-07-03T07:32:45.898028Z INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
2024-07-03T07:32:46.391961Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.396258Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.485735Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.491494Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.551772Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:47.250573Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:47.268179Z WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
thread 'actix-rt|system:0|arbiter:5' panicked at chain/chain/src/runtime/mod.rs:412:21:
RuntimeError::UnexpectedIntegerOverflow remove_delayed_receipt_gas
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: near_chain::runtime::NightshadeRuntime::process_state_update::{{closure}}
3: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
4: near_chain::chain::Chain::set_state_finalize
5: near_client::sync::state::StateSync::sync_shards_status
6: near_client::sync::state::StateSync::run
7: near_client::client::Client::run_catchup
8: near_client::client_actor::ClientActorInner::catchup
9: core::ops::function::FnOnce::call_once{{vtable.shim}}
10: <actix::utils::TimerFunc<A> as actix::fut::future::ActorFuture<A>>::poll
11: <actix::contextimpl::ContextFut<A,C> as core::future::future::Future>::poll
12: tokio::runtime::task::raw::poll
13: tokio::task::local::LocalSet::tick
14: tokio::task::local::LocalSet::run_until::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)
Not sure if it is related but I was running a traffic load almost near that time. I caused some shard congestion and got some delayed receipts too according to the Grafana dashboard. Look at the 12:30 mark below.
I suggest we should look into the delayed receipts thingy. I don't see other loads causing this. Here is my test btw - https://github.com/jaswinder6991/nearcore/blob/statelessnet_latest/pytest/tests/loadtest/locust/locustfiles/linkdrop.py
/cc @wacban @jakmeier looks like a congestion control problem
Looks like this is triggered in catchup, so probably what you just fixed @wacban in https://github.com/near/nearcore/pull/11712, right?
Yes, the fix should be released tomorrow.
@DDeAlmeida @jaswinder6991 is it still reproducible?
latest version : 1.36.1-757-gfe6a736aa