opentensor / bittensor

Internet-scale Neural Networks
https://www.bittensor.com/
MIT License
851 stars 301 forks source link

Bittensor Archive node stuck at block 2585474 #1752

Open AlexZhenWang opened 5 months ago

AlexZhenWang commented 5 months ago

Describe the bug

Hi, I'm currently operating a Bittensor node, and it seems to be stuck at block 2585474.

The logs indicate that the best block remains at 2585476, while the finalized block remains at 2585474. Does anyone have any suggestions on how to resolve this issue?

Additionally, I would greatly appreciate it if someone could provide the p2p address of a node that has successfully passed this block. I'd like to try using it as a peer for my node to see if it helps resolve the problem.

Thank you in advance for any assistance!

To Reproduce

Run a bittensor archive node with parameters:

--base-path=/chain-data
--rpc-cors=all
--port=20540
--rpc-port=9933
--ws-port=9944
--ws-external
--rpc-external
--node-key=bb86e433fe0f1f662a6fdf93211d21fa1e72537865f2f58ddec8e63d6eab3348
--pruning=archive
--rpc-methods=Unsafe
--in-peers=25
--out-peers=25
--prometheus-external
--chain=/raw_spec.json
--in-peers-light=0
--max-runtime-instances=128
--ws-max-connections=10000

Expected behavior

Expect the node to keep syncing the latest block

Screenshots

No response

Environment

Ubuntu VERSION="20.04.6 LTS (Focal Fossa)"

Additional context

No response

COLUD4 commented 5 months ago

I have the same problem, node stuck 2585474 not forward sync. And there are some error messages in the log.

Error while running root epoch: "Not the block to update emission values."
2024-03-21 12:57:42 Accepting new connection 1/10000 2024-03-21 12:57:42 panicked at 'index out of bounds: the len is 6 but the index is 24', /home/ala/.bittensor/subtensor/pallets/subtensor/src/epoch.rs:609:17
2024-03-21 12:57:42 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm unreachable instruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x15b40c - !rust_begin_unwind 1: 0x368d - !core::panicking::panic_fmt::h6e5483b5a3d4ae69 2: 0x37af - !core::panicking::panic_bounds_check::h00851e534fe3a3c6 3: 0x18b1f - !pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet>::generate_emission::he93eacf41c60aff6 4: 0x426dc - !<pallet_subtensor::pallet::Pallet as frame_support::traits::hooks::OnInitialize<::BlockNumber>>::on_initialize::h0c587ec9535d6f22 5: 0x808d3 - !<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18) as frame_support::traits::hooks::OnInitialize>::on_initialize::ha022e71db578c7cf 6: 0x125985 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f 7: 0x124ce8 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099 8: 0x69d94 - !Core_execute_block
2024-03-21 12:57:42 💔 Error importing block 0x55240159861bf33a60ec91e89929becfccf67ec4224084eb906e7871422a5def: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm unreachable instruction executed WASM backtrace: error while executing at wasm backtrace: 0: 0x15b40c - !rust_begin_unwind 1: 0x368d - !core::panicking::panic_fmt::h6e5483b5a3d4ae69 2: 0x37af - !core::panicking::panic_bounds_check::h00851e534fe3a3c6 3: 0x18b1f - !pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet>::generate_emission::he93eacf41c60aff6 4: 0x426dc - !<pallet_subtensor::pallet::Pallet as frame_support::traits::hooks::OnInitialize<::BlockNumber>>::on_initialize::h0c587ec9535d6f22 5: 0x808d3 - !<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18) as frame_support::traits::hooks::OnInitialize>::on_initialize::ha022e71db578c7cf 6: 0x125985 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f 7: 0x124ce8 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099 8: 0x69d94 - !Core_execute_block

stenreijers commented 5 months ago

I got the same issue here

0|dgs:subtensor-finney  | 2024-03-26 20:33:30 ⚙ïļ  Syncing  0.0 bps, target=#2641792 (494 peers), best: #2585476 (0x3a90â€Ķa05e), finalized #2585474 (0x0c75â€Ķ0bd6), ⮇ 163.9kiB/s ⮆ 145.1kiB/s    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 panicked at 'index out of bounds: the len is 6 but the index is 24', /home/ala/.bittensor/subtensor/pallets/subtensor/src/epoch.rs:609:17    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
0|dgs:subtensor-finney  | WASM backtrace:
0|dgs:subtensor-finney  | error while executing at wasm backtrace:
0|dgs:subtensor-finney  |     0: 0x15b40c - <unknown>!rust_begin_unwind
0|dgs:subtensor-finney  |     1: 0x368d - <unknown>!core::panicking::panic_fmt::h6e5483b5a3d4ae69
0|dgs:subtensor-finney  |     2: 0x37af - <unknown>!core::panicking::panic_bounds_check::h00851e534fe3a3c6
0|dgs:subtensor-finney  |     3: 0x18b1f - <unknown>!pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet<T>>::generate_emission::he93eacf41c60aff6
0|dgs:subtensor-finney  |     4: 0x426dc - <unknown>!<pallet_subtensor::pallet::Pallet<T> as frame_support::traits::hooks::OnInitialize<<T as frame_system::pallet::Config>::BlockNumber>>::on_initialize::h0c587ec9535d6f22
0|dgs:subtensor-finney  |     5: 0x808d3 - <unknown>!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18) as frame_support::traits::hooks::OnInitialize<BlockNumber>>::on_initialize::ha022e71db578c7cf
0|dgs:subtensor-finney  |     6: 0x125985 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f
0|dgs:subtensor-finney  |     7: 0x124ce8 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099
0|dgs:subtensor-finney  |     8: 0x69d94 - <unknown>!Core_execute_block    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 💔 Error importing block 0x55240159861bf33a60ec91e89929becfccf67ec4224084eb906e7871422a5def: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
0|dgs:subtensor-finney  | WASM backtrace:
0|dgs:subtensor-finney  | error while executing at wasm backtrace:
0|dgs:subtensor-finney  |     0: 0x15b40c - <unknown>!rust_begin_unwind
0|dgs:subtensor-finney  |     1: 0x368d - <unknown>!core::panicking::panic_fmt::h6e5483b5a3d4ae69
0|dgs:subtensor-finney  |     2: 0x37af - <unknown>!core::panicking::panic_bounds_check::h00851e534fe3a3c6
0|dgs:subtensor-finney  |     3: 0x18b1f - <unknown>!pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet<T>>::generate_emission::he93eacf41c60aff6
0|dgs:subtensor-finney  |     4: 0x426dc - <unknown>!<pallet_subtensor::pallet::Pallet<T> as frame_support::traits::hooks::OnInitialize<<T as frame_system::pallet::Config>::BlockNumber>>::on_initialize::h0c587ec9535d6f22
0|dgs:subtensor-finney  |     5: 0x808d3 - <unknown>!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18) as frame_support::traits::hooks::OnInitialize<BlockNumber>>::on_initialize::ha022e71db578c7cf
0|dgs:subtensor-finney  |     6: 0x125985 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f
0|dgs:subtensor-finney  |     7: 0x124ce8 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099
0|dgs:subtensor-finney  |     8: 0x69d94 - <unknown>!Core_execute_block    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 💔 Error importing block 0x6800a3b515c32bcf9a8e7c4be1c58adf90c24764e238e0793cb87e8980284ff0: block has an unknown parent    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 💔 Error importing block 0x9b884a6039eb9c0e27f9a6c0a46b0fc94366027cb3403198c247d0ff4c3aee77: block has an unknown parent    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 💔 Error importing block 0x1e713badff17bb63e13406dfd9d941f36dad27c72dea9636ad6dc92b43af5d06: block has an unknown parent    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 💔 Error importing block 0xccd0e5def42016482364f35fd6e56b34d03b76e6d24e59a3722f8933c518ba88: block has an unknown parent    
0|dgs:subtensor-finney  | 2024-03-26 20:33:34 Error while running root epoch: "Not the block to update emission values."    
0|dgs:subtensor-finney  | 2024-03-26 20:33:35 ⚙ïļ  Syncing  0.0 bps, target=#2641792 (64 peers), best: #2585476 (0x3a90â€Ķa05e), finalized #2585474 (0x0c75â€Ķ0bd6), ⮇ 579.2kiB/s ⮆ 69.2kiB/s  
stenreijers commented 4 months ago

I managed to remove the last 1000 blocks (including finalized ones) by manually modifying the rocks database. It caused the node to reevaluate the whole chain, starting at finalized=0. It took 2 hours before it catched up and started to process blocks again.

2024-03-28 00:03:18 ⚙ïļ  Syncing  0.0 bps, target=#2649351 (502 peers), best: #2585462 (0x2861â€Ķ0277), finalized #2585088 (0x4c58â€Ķ6368), ⮇ 330.8kiB/s ⮆ 98.6kiB/s
2024-03-28 00:03:21 Error while running root epoch: "Not the block to update emission values."
2024-03-28 00:03:23 ⚙ïļ  Syncing  0.2 bps, target=#2649352 (502 peers), best: #2585463 (0x1d82â€Ķ1170), finalized #2585088 (0x4c58â€Ķ6368), ⮇ 688.5kiB/s ⮆ 87.3kiB/s
2024-03-28 00:03:27 Successfully ran block step.
2024-03-28 00:03:27 do_set_weights( origin: netuid:28, uids:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57>
2024-03-28 00:03:27 check_version_key( network_version_key:1000, version_key:1002 )
2024-03-28 00:03:27 WeightsSet( netuid:28, neuron_uid:94 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:23, uids:[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, >
2024-03-28 00:03:27 check_version_key( network_version_key:0, version_key:0 )
2024-03-28 00:03:27 WeightsSet( netuid:23, neuron_uid:202 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:2, uids:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,>
2024-03-28 00:03:27 check_version_key( network_version_key:0, version_key:18446744073709551615 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:15, uids:[3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63>
2024-03-28 00:03:27 do_set_weights( origin: netuid:2, uids:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,>
2024-03-28 00:03:27 check_version_key( network_version_key:0, version_key:671 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:26, uids:[1, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 63, 64, 65, 66, >
2024-03-28 00:03:27 check_version_key( network_version_key:0, version_key:12 )
2024-03-28 00:03:27 WeightsSet( netuid:26, neuron_uid:2 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:24, uids:[0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 6>
2024-03-28 00:03:27 check_version_key( network_version_key:0, version_key:10012 )
2024-03-28 00:03:27 do_set_weights( origin: netuid:9, uids:[128, 139, 140, 141, 175, 193, 207, 217, 218, 219, 220, 221, 222, 223], values:[3, 14, 32, 12030, 6, 1, 65535, 75, 174, 406, 948, 2210, 5156, 28075])
2024-03-28 00:03:27 check_version_key( network_version_key:2000, version_key:2000000 )
2024-03-28 00:03:27 WeightsSet( netuid:9, neuron_uid:247 )
2024-03-28 00:03:27 do_remove_stake( origin: hotkey:, stake_to_be_removed:12030480000 )
2024-03-28 00:03:27 StakeRemoved( hotkey:, stake_to_be_removed:12030480000 )
2024-03-28 00:03:27 do_add_stake( origin: hotkey:, stake_to_be_added:915000000 )
2024-03-28 00:03:27 StakeAdded( hotkey:, stake_to_be_added:915000000 )
2024-03-28 00:03:27 do_registration( coldkey: netuid:18 hotkey: )
2024-03-28 00:03:27 do_registration( coldkey: netuid:18 hotkey: )
2024-03-28 00:03:27 do_registration( coldkey: netuid:18 hotkey: )

Then when it came to the critical point, it started to error again:

2024-03-28 00:13:29 ⚙ïļ  Syncing  0.0 bps, target=#2649383 (49 peers), best: #2585476 (0x3a90â€Ķa05e), finalized #2585474 (0x0c75â€Ķ0bd6), ⮇ 377.4kiB/s ⮆ 115.0kiB/s
2024-03-28 00:13:31 panicked at 'index out of bounds: the len is 6 but the index is 24', /home/ala/.bittensor/subtensor/pallets/subtensor/src/epoch.rs:609:17
2024-03-28 00:13:31 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x15b40c - <unknown>!rust_begin_unwind
    1: 0x368d - <unknown>!core::panicking::panic_fmt::h6e5483b5a3d4ae69
    2: 0x37af - <unknown>!core::panicking::panic_bounds_check::h00851e534fe3a3c6
    3: 0x18b1f - <unknown>!pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet<T>>::generate_emission::he93eacf41c60aff6
    4: 0x426dc - <unknown>!<pallet_subtensor::pallet::Pallet<T> as frame_support::traits::hooks::OnInitialize<<T as frame_system::pallet::Config>::BlockNumber>>::on_initialize::h0c587ec9535d6f22
    5: 0x808d3 - <unknown>!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleEl>
    6: 0x125985 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f
    7: 0x124ce8 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099
    8: 0x69d94 - <unknown>!Core_execute_block
2024-03-28 00:13:31 💔 Error importing block 0x55240159861bf33a60ec91e89929becfccf67ec4224084eb906e7871422a5def: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction >
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x15b40c - <unknown>!rust_begin_unwind
    1: 0x368d - <unknown>!core::panicking::panic_fmt::h6e5483b5a3d4ae69
    2: 0x37af - <unknown>!core::panicking::panic_bounds_check::h00851e534fe3a3c6
    3: 0x18b1f - <unknown>!pallet_subtensor::block_step::<impl pallet_subtensor::pallet::Pallet<T>>::generate_emission::he93eacf41c60aff6
    4: 0x426dc - <unknown>!<pallet_subtensor::pallet::Pallet<T> as frame_support::traits::hooks::OnInitialize<<T as frame_system::pallet::Config>::BlockNumber>>::on_initialize::h0c587ec9535d6f22
    5: 0x808d3 - <unknown>!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleEl>
    6: 0x125985 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h7f648fc2398e6e2f
    7: 0x124ce8 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h62e1ba4c76fbe099
    8: 0x69d94 - <unknown>!Core_execute_block
2024-03-28 00:13:31 💔 Error importing block 0x90a66c212fd64a305b320af8a4f7435f2c9b99b20f4b965f46cf58c2d5694785: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0x10df2b02226562e273c1193244983447d1baed17dd9d9f2dc24c4c7910d173c5: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0x5df5ee463c58d2014ad57e98deeff46350becac765d7aecbfd9bb1698b816f2b: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0x13d3e0d61f9f71599914a251853e15f0fafd369e517fda4ed9349a00ee7e9480: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0x88674e6d170e51eb0ebc3c0ed83d2771295e1ab565f1ed5bdefbc236630e99b4: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0xac7e8b5b3058766720fcbdfb96595a6c298934ae13fec7e786b8f252bafff5a2: block has an unknown parent
2024-03-28 00:13:31 💔 Error importing block 0x305bccfe05d807440717e26dc9e4f639d49e52f3d4ed3ef0e60a40c4f7621046: block has an unknown parent
mikeletux-cube commented 4 months ago

Any updates on this? In the Discord channel some snapshot was promised to overcome the issue but yet nothing has been published (at least that I'm aware of).

bobbychen commented 4 months ago

@ifrit98 Could you help us? I saw many people have this problem when running bts node.

Motjam commented 4 months ago

Was looking into downloading Bittensor as a node. But decided to look at the Github before I did saw this, so is the issue resolved? If not, please someone let me know.

Thanks.