near / stakewars-iv

12 stars 8 forks source link

RuntimeError::UnexpectedIntegerOverflow remove_delayed_receipt_gas #178

Open DDeAlmeida opened 3 days ago

DDeAlmeida commented 3 days ago

latest version : 1.36.1-757-gfe6a736aa

Next blocks to catch up: done
2024-07-02T08:31:37.395289Z  INFO stats: #119239075 5xd9e3QiGjBBaggVZojMWcEPGBaUrHh8uHvnyAwQ1Ps8 118 validators 57 peers ⬇ 1.78 MB/s ⬆ 2.22 MB/s 0.70 bps 55.6 Tgas/s CPU: 461%, Mem: 6.86 GB
2024-07-02T08:31:37.395306Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply in progress
Next blocks to catch up: done
2024-07-02T08:31:47.395484Z  INFO stats: #119239083 7h9wcgtdR8ycLmLVhmsEsRUS7DHikFm2HKKM4sPrDZpt 118 validators 57 peers ⬇ 1.74 MB/s ⬆ 2.22 MB/s 0.80 bps 81.7 Tgas/s CPU: 554%, Mem: 7.94 GB
2024-07-02T08:31:47.395494Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
2024-07-02T08:31:57.396697Z  INFO stats: #119239090 irjcpx5LHw33gWcy4CXcCSTE9HDNzjrMH3NLmGiLLhR 118 validators 56 peers ⬇ 1.74 MB/s ⬆ 2.29 MB/s 0.70 bps 159 Tgas/s CPU: 226%, Mem: 10.3 GB
2024-07-02T08:31:57.396707Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
2024-07-02T08:32:07.397781Z  INFO stats: #119239098 29Q3J7AUexU3rmrfTGTfHYHHERBF7k62JdLGgUFXV4TA 118 validators 56 peers ⬇ 1.78 MB/s ⬆ 2.27 MB/s 0.80 bps 180 Tgas/s CPU: 225%, Mem: 12.3 GB
2024-07-02T08:32:07.397792Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
2024-07-02T08:32:17.398188Z  INFO stats: #119239105 CoFb3BRnn3qrqp4KXpjPcFu1t7dZgSiJkhK44aNayZfb 118 validators 57 peers ⬇ 1.68 MB/s ⬆ 2.13 MB/s 0.60 bps 129 Tgas/s CPU: 186%, Mem: 14.3 GB
2024-07-02T08:32:17.398198Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
thread 'actix-rt|system:0|arbiter:5' panicked at chain/chain/src/runtime/mod.rs:412:21:
RuntimeError::UnexpectedIntegerOverflow remove_delayed_receipt_gas
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: near_chain::runtime::NightshadeRuntime::process_state_update::{{closure}}
   3: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
   4: near_chain::chain::Chain::set_state_finalize
   5: near_client::sync::state::StateSync::sync_shards_status
   6: near_client::sync::state::StateSync::run
   7: near_client::client::Client::run_catchup
   8: near_client::client_actor::ClientActorInner::catchup
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: <actix::utils::TimerFunc<A> as actix::fut::future::ActorFuture<A>>::poll
  11: <actix::contextimpl::ContextFut<A,C> as core::future::future::Future>::poll
  12: tokio::runtime::task::raw::poll
  13: tokio::task::local::LocalSet::tick
  14: tokio::task::local::LocalSet::run_until::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)
DDeAlmeida commented 3 days ago

Another one while syncing


2024-07-03T07:32:45.898017Z  INFO stats: #119240171 Downloading blocks 1.65% (63481 left; at 119240171) 69 peers ⬇ 1.37 MB/s ⬆ 206 kB/s 0.00 bps 0 gas/s CPU: 220%, Mem: 13.1 GB
2024-07-03T07:32:45.898028Z  INFO stats: Catchups
Sync block 7GUxrouPwW9WLhpuLqQwVCndTHieez1M9GmRKHheUXnB@119237521
Shard sync status: Shard 2 apply finalizing
Next blocks to catch up: done
2024-07-03T07:32:46.391961Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.396258Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.485735Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.491494Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:46.551772Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:47.250573Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
2024-07-03T07:32:47.268179Z  WARN chunks: Error processing partial encoded chunk: ChainError(InvalidChunkHeight)
thread 'actix-rt|system:0|arbiter:5' panicked at chain/chain/src/runtime/mod.rs:412:21:
RuntimeError::UnexpectedIntegerOverflow remove_delayed_receipt_gas
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: near_chain::runtime::NightshadeRuntime::process_state_update::{{closure}}
   3: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
   4: near_chain::chain::Chain::set_state_finalize
   5: near_client::sync::state::StateSync::sync_shards_status
   6: near_client::sync::state::StateSync::run
   7: near_client::client::Client::run_catchup
   8: near_client::client_actor::ClientActorInner::catchup
   9: core::ops::function::FnOnce::call_once{{vtable.shim}}
  10: <actix::utils::TimerFunc<A> as actix::fut::future::ActorFuture<A>>::poll
  11: <actix::contextimpl::ContextFut<A,C> as core::future::future::Future>::poll
  12: tokio::runtime::task::raw::poll
  13: tokio::task::local::LocalSet::tick
  14: tokio::task::local::LocalSet::run_until::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted (core dumped)
jaswinder6991 commented 3 days ago

Not sure if it is related but I was running a traffic load almost near that time. I caused some shard congestion and got some delayed receipts too according to the Grafana dashboard. Look at the 12:30 mark below.

Screenshot 2024-07-03 at 1 50 14 PM Screenshot 2024-07-03 at 1 50 03 PM

I suggest we should look into the delayed receipts thingy. I don't see other loads causing this. Here is my test btw - https://github.com/jaswinder6991/nearcore/blob/statelessnet_latest/pytest/tests/loadtest/locust/locustfiles/linkdrop.py

jancionear commented 2 days ago

/cc @wacban @jakmeier looks like a congestion control problem

jakmeier commented 2 days ago

Looks like this is triggered in catchup, so probably what you just fixed @wacban in https://github.com/near/nearcore/pull/11712, right?

wacban commented 2 days ago

Yes, the fix should be released tomorrow.