near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.32k stars 623 forks source link

The node stopped validating and getting stuck on the same block. "Downloading blocks 0.00%". #12027

Open ama31337 opened 2 months ago

ama31337 commented 2 months ago

Validator node suddenly stopped validating and getting stuck on the same block with "Downloading blocks 0.00%..." Key error:

Aug 30 21:33:25 near-lux8 neard[296594]: 2024-08-30T19:33:25.854212Z ERROR client: Banning chunk producer for producing invalid chunk chunk_producer=AccountId("bisontrails2.poolv1.near") epoch_id=EpochId(FqfESvDE4WBUvbChAZu4DrpHnDmjj6TM2GvLQC2Kx3hD) chunk_hash=ChunkHash(CcypJUXGXKfPAXPpb5caeGx6eYCGeydSCNiLKU9VcVQr)
Aug 30 21:33:25 near-lux8 neard[296594]: 2024-08-30T19:33:25.854289Z  WARN client: Receive bad block err=InvalidChunkState(ChunkState { prev_block_header: [3, 6, 242, 228, 5, 103, 240, 129, 188, 13, 162, 123, 68, 196, 211, 121, 70, 225, 83, 255, 49, 58, 62, 125, 39, 126, 237, 142, 243, 29, 54, 107, 160, 127, 159, 144, 7, 0, 0, 0, 0, 220, 122, 185, 133, 12, 137, 166, 14, 144, 245, 164, 195, 248, 156, 95, 220, 211, 60, 215, 176, 83, 93, 185, 154, 160, 146, 155, 184, 134, 244, 241, 252, 70, 221, 154, .....

log is in attachment near_out.log

nagisa commented 2 months ago

Can you specify what's the output of your neard --version as well as if restarting fails to advance anyway?

telezhnaya commented 2 months ago

Ama ✨ lux8.net, [3 Sep 2024 at 16:17:37]:

neard (release 2.1.1) (build 2.1.1-modified) (rustc 1.79.0) (protocol 70) (db 40)

on sha-ni supported cpu

https://t.me/c/1331540142/21464

ama31337 commented 1 month ago

Thanks to telezhnaya for pointing out the version above. neard (release 2.1.1) Restarts did not help, neither did waiting, about 5 restarts were made, total waiting time about 1 hour - node did not change its state, after which the data was deleted and a new snapshot was downloaded. After that the node continued validating and is working now.

bowenwang1996 commented 1 month ago

@ama31337 we looked at the block that it was stuck on. It appears that the block contains a Ethereum implicit account transaction

Receipt 4MZVTEBZJPeyiQgVzn2zvXfmen8rtzrkrMFbtzmySiWw from tellawraen.near to 0x44263bb4f851c784d62bf9477bc105a9998a191a gas cost 2712747476224 burnt balance 271274747622400000000

It is very likely that you have run version 2.1.0 on this node in the past, which caused the state to be corrupted and it does not matter whether you upgrade to 2.1.1 afterwards. You need to recover from a new snapshot

telezhnaya commented 1 month ago

Another example: https://discord.com/channels/490367152054992913/611591221474885632/1278370328246095945

our validator is running on version 2.1.1 in mainnet

after restarting secondary node with keys from primary server it stuck downloading blocks: 2024-08-28T14:39:42.108268Z INFO stats: #126754210 Downloading blocks 0.00% (115 left; at 126754210) 32 peers ⬇ 8.00 MB/s ⬆ 6.72 MB/s 0.00 bps 0 gas/s CPU: 165%, Mem: 8.52 GB ... 2024-08-28T14:39:52.108957Z INFO stats: #126754210 Downloading blocks 0.00% (123 left; at 126754210) 32 peers ⬇ 8.05 MB/s ⬆ 6.75 MB/s 0.00 bps 0 gas/s CPU: 89%, Mem: 8.50 GB ... 2024-08-28T14:40:02.109626Z INFO stats: #126754210 Downloading blocks 0.00% (131 left; at 126754210) 32 peers ⬇ 8.12 MB/s ⬆ 6.34 MB/s 0.00 bps 0 gas/s CPU: 71%, Mem: 8.54 GB