Collators stuck producing same block

Mihajlo-Pavlovic commented 3 months ago

Is there an existing issue?

[X] I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

[X] This is not a support question.

Description of bug

At NeuroWeb we are experiencing an issue with NeuroWeb Testnet connected to the Rococo relay chain.

As per @skunert request, we are opening a new issue. Initially, we reported this to issue #1202.

After runtime upgrade where we changed dependencies from v0.9.40 to v1.9.0 (https://github.com/OriginTrail/neuroweb/pull/86), upgrade was successful but after 30+ blocks, the chain could not produce a block.

All collators have block 4238138 as their best block and 4238137 as finalized, but they are constantly trying to create 4238138 block again and are stuck in a loop. With following logs, with aura::cumulus=trace, parachain=debug, txpool=debug:

txpool: [Relaychain] resolve hash: 0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a finalized: true tree_route: (common HashAndNumber { number: 11640010, hash: 0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a }, last Some(HashAndNumber { number: 11640010, hash: 0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a })) best_block: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd finalized_block:0xd5b41cf824e910b2cbcdb29c182a3628c81b94b213e0fa227e172790ca395a2c
txpool: [Relaychain] [0x4a76636fb0e6c8ed44f288581bda29657e060d09d9f10c6862bfee1364f14bba] Sent finalization event (block 0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a)
txpool: [Relaychain] [0x87a8171eecead4e2d3df4194491a1a9d9cf900c2cc5ad5707cce9558135c4ffe] Sent finalization event (block 0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a)
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=0
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640012 (0x91a4…1acd), finalized #11640010 (0xa636…2f0a), ⬇ 26.2kiB/s ⬆ 85.6kiB/s
substrate: [Parachain] 💤 Idle (9 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 438.7kiB/s ⬆ 1.6kiB/s
substrate: [Relaychain] ✨ Imported #11640013 (0xc65c…77a5)
txpool: [Relaychain] resolve hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 finalized: false tree_route: (common HashAndNumber { number: 11640012, hash: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd }, last Some(HashAndNumber { number: 11640013, hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 })) best_block: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd finalized_block:0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a
txpool: [Relaychain] Starting pruning of block 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 (extrinsics: 2)
txpool: [Relaychain] Pruning at 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
txpool: [Relaychain] [0x40875f91b0b73d0d2317a58e4614f1f743eefa4d4c744ff17f143dcae86cbf28] Pruned at 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
txpool: [Relaychain] [0xea6c0728a522b3bb31b6095aeb2cf3b36ad6f9e9133424ed09db6d475fed155b] Pruned at 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha")
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155471) para_slot=Slot(143577735) timestamp=Timestamp(1722932826000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
parachain::collation-generation: [Relaychain] collator returned no collation on collate para_id=2043
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a
substrate: [Relaychain] ✨ Imported #11640013 (0xd848…2a4c)
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0xd84896d961363cda754d331bb079735dd6a71467cf895c5e38fe54b72f702a4c
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155471) para_slot=Slot(143577735) timestamp=Timestamp(1722932826000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
parachain::collation-generation: [Relaychain] collator returned no collation on collate para_id=2043
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0xd84896d961363cda754d331bb079735dd6a71467cf895c5e38fe54b72f702a4c
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0xd84896d961363cda754d331bb079735dd6a71467cf895c5e38fe54b72f702a4c
parachain::network-bridge-rx: [Relaychain] action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha") role=Authority
substrate: [Parachain] 💤 Idle (9 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 0.1kiB/s ⬆ 0.1kiB/s
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640013 (0xc65c…77a5), finalized #11640010 (0xa636…2f0a), ⬇ 18.8kiB/s ⬆ 64.8kiB/s
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=0
txpool: [Relaychain] resolve hash: 0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd finalized: true tree_route: (common HashAndNumber { number: 11640011, hash: 0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd }, last Some(HashAndNumber { number: 11640011, hash: 0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd })) best_block: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 finalized_block:0xa6366462dde19877bf5ad303d934c9adf1cb17bf1a9f2bde4758c38618442f0a
txpool: [Relaychain] [0x58b98969b9d8a09eb89ed59c1369118b88ebb67df883ace0b2d898e737d91d92] Sent finalization event (block 0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd)
txpool: [Relaychain] [0x1f4e48a43f88a9a5119c53f9d51f9fc369a2370ade7ac1848f072d6e7cbff7f6] Sent finalization event (block 0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd)
txpool: [Relaychain] resolve hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b finalized: false tree_route: (common HashAndNumber { number: 11640013, hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 }, last Some(HashAndNumber { number: 11640014, hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b })) best_block: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 finalized_block:0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd
txpool: [Relaychain] Starting pruning of block 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b (extrinsics: 2)
txpool: [Relaychain] Pruning at 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
txpool: [Relaychain] [0x4e801750402d39b341c450691875c36cae12d5c0ba9a5438bd46dd5231f64e7e] Pruned at 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
txpool: [Relaychain] [0xeb447a276f71fa85f0fe3f4d3936f76f7c1ffcbd4f0af5dc8a01c42da97d7a80] Pruned at 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
substrate: [Relaychain] ✨ Imported #11640014 (0x2394…b99b)
parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha")
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155472) para_slot=Slot(143577736) timestamp=Timestamp(1722932832000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
sc_basic_authorship::basic_authorship: [Parachain] 🙌 Starting consensus session on top of parent 0xdf704d9ce889278706c9e695c587caa3dcef6633b576b189972c455796ecce3d
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
parachain::network-bridge-rx: [Relaychain] action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha") role=Authority
sc_basic_authorship::basic_authorship: [Parachain] 🎁 Prepared block for proposing at 4238138 (349 ms) [hash: 0xc3338eda169221f68c72e77b5955900e0add2cddafdb5f6566e12d55055effc4; parent_hash: 0xdf70…ce3d; extrinsics (20): [0xf3eb…41fa, 0x5507…4263, 0xb9b3…03d6, 0x3a58…9c26, 0xf30d…8df3, 0xe8e6…ed9e, 0x4b0b…caa1, 0xd591…c703, 0xb6e8…0417, 0xa982…3a1f, 0xd151…18b7, 0x2423…1f0e, 0x9e72…2d6e, 0x3129…9dd8, 0xa7c0…9a04, 0xa622…7461, 0xec09…16b5, 0x2f71…e169, 0x4ed7…60f7, 0x678a…039d]
aura::cumulus: [Parachain] 🔖 Pre-sealed block for proposal at 4238138. Hash now 0xbb628f313726dc88f076a4367ff76b9d53181014720173d778d47a36bd129d50, previously 0xc3338eda169221f68c72e77b5955900e0add2cddafdb5f6566e12d55055effc4.
substrate: [Parachain] ✨ Imported #4238138 (0xbb62…9d50)
aura::cumulus: [Parachain] PoV size { header: 0.8251953125kb, extrinsics: 16.5966796875kb, storage_proof: 477.77734375kb }
aura::cumulus: [Parachain] Compressed PoV size: 383.77734375kb
parachain::collation-generation: [Relaychain] candidate is generated candidate_hash=0x9f397dd749697b65937c08728eb78251dc9903c0c8d926742b790a74ed1b6a74 pov_hash=0x7fc43e8d860d8a669fbead4e7e55e8098f2d19245fcba844b873ccbc9fb50288 relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b para_id=2043 traceID=211645764608713431645010785893780783697
parachain::collator-protocol: [Relaychain] Received session info session_index=19858
parachain::collator-protocol: [Relaychain] Accepted collation, connecting to validators. para_id=2043 candidate_relay_parent=0x2394…b99b relay_parent_mode=Enabled { max_candidate_depth: 3, allowed_ancestry_len: 2 } candidate_hash=0x9f397dd749697b65937c08728eb78251dc9903c0c8d926742b790a74ed1b6a74 pov_hash=0x7fc43e8d860d8a669fbead4e7e55e8098f2d19245fcba844b873ccbc9fb50288 core=CoreIndex(31) current_validators=[Public(d2c1cc38912d30962d3d7ba2b8f870a4165c2efd32d5f83ff5db714d98a92737 (5Gq3UxjZ...)), Public(9a6063837b54d95e010a1adaeaf99e80f5626867947c081d8f8f803cbb7f1b60 (5FZ7s7Zk...))] traceID=211645764608713431645010785893780783697
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=2
peerset: [Relaychain] Trying to remove unknown reserved node 12D3KooWJmY53kQHD3XyjxdsrXVCbdpEXjqzt75h9mqiwZ55HJ3w from SetId(3).
peerset: [Relaychain] Trying to remove unknown reserved node 12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha from SetId(3).
parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWSikgbrcWjVgSed7r1uXk4TeAieDnHKtrPDVZBu5XkQha")
parachain::network-bridge-rx: [Relaychain] action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWLv6RWTuQdCnrEfduUwu9nEHqK1mSX1QoFsCLWmpkXqgN") role=Authority
parachain::network-bridge-rx: [Relaychain] action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWEDS8oHjo1ZCtHGAWRTs2oKcycvTdh5Mg56tsM1YtC4Xe") role=Authority
parachain::collator-protocol: [Relaychain] Advertising collation. relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b candidate_hash=0x9f397dd749697b65937c08728eb78251dc9903c0c8d926742b790a74ed1b6a74 peer_id=12D3KooWLv6RWTuQdCnrEfduUwu9nEHqK1mSX1QoFsCLWmpkXqgN traceID=211645764608713431645010785893780783697
parachain::collator-protocol: [Relaychain] Advertising collation. relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b candidate_hash=0x9f397dd749697b65937c08728eb78251dc9903c0c8d926742b790a74ed1b6a74 peer_id=12D3KooWEDS8oHjo1ZCtHGAWRTs2oKcycvTdh5Mg56tsM1YtC4Xe traceID=211645764608713431645010785893780783697
substrate: [Parachain] 💤 Idle (12 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 439.9kiB/s ⬆ 1.4kiB/s
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640014 (0x2394…b99b), finalized #11640011 (0xbf4d…98dd), ⬇ 32.7kiB/s ⬆ 229.0kiB/s
txpool: [Relaychain] resolve hash: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd finalized: true tree_route: (common HashAndNumber { number: 11640012, hash: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd }, last Some(HashAndNumber { number: 11640012, hash: 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd })) best_block: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b finalized_block:0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd
txpool: [Relaychain] [0xc1e4672b2d08057fe468c47e1bd64fd1333fac79877c4c296e7d8a31e9cd5c04] Sent finalization event (block 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd)
txpool: [Relaychain] [0x2c860654fc88fa8333d39f232218f3d0bb099c2842886dd33e51602130e96d20] Sent finalization event (block 0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd)
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=0
txpool: [Relaychain] resolve hash: 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 finalized: false tree_route: (common HashAndNumber { number: 11640014, hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b }, last Some(HashAndNumber { number: 11640015, hash: 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 })) best_block: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b finalized_block:0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd
substrate: [Relaychain] ✨ Imported #11640015 (0xc839…4f48)
txpool: [Relaychain] Starting pruning of block 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 (extrinsics: 2)
txpool: [Relaychain] Pruning at 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
txpool: [Relaychain] [0x9eaa9f2ef80c33c5717a6c24cf409546f484bb371b7d0b4ee820961ea4c5bd61] Pruned at 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
txpool: [Relaychain] [0x0413e174d3cb911b922c7e8587cf7b47b3eacd7316a9f5eb1e5502e97a2019dc] Pruned at 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155473) para_slot=Slot(143577736) timestamp=Timestamp(1722932838000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
parachain::collation-generation: [Relaychain] collator returned no collation on collate para_id=2043
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48
substrate: [Parachain] 💤 Idle (14 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 220.2kiB/s ⬆ 517.9kiB/s
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640015 (0xc839…4f48), finalized #11640012 (0x91a4…1acd), ⬇ 22.0kiB/s ⬆ 50.8kiB/s
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=0
parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWLv6RWTuQdCnrEfduUwu9nEHqK1mSX1QoFsCLWmpkXqgN")
parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWEDS8oHjo1ZCtHGAWRTs2oKcycvTdh5Mg56tsM1YtC4Xe")
txpool: [Relaychain] resolve hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 finalized: true tree_route: (common HashAndNumber { number: 11640013, hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 }, last Some(HashAndNumber { number: 11640013, hash: 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5 })) best_block: 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 finalized_block:0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd
txpool: [Relaychain] [0x40875f91b0b73d0d2317a58e4614f1f743eefa4d4c744ff17f143dcae86cbf28] Sent finalization event (block 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5)
txpool: [Relaychain] [0xea6c0728a522b3bb31b6095aeb2cf3b36ad6f9e9133424ed09db6d475fed155b] Sent finalization event (block 0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5)
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0xbf4d9c9f3e890826a6aef6840515ef12807878a68dfa4119c52fe995376d98dd
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0x91a44d00141e012429cdbf840150322e43a47ed112f8e131c8ea836d807b1acd
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640015 (0xc839…4f48), finalized #11640013 (0xc65c…77a5), ⬇ 41.5kiB/s ⬆ 56.2kiB/s
substrate: [Parachain] 💤 Idle (0 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 501.4kiB/s ⬆ 143.7kiB/s
txpool: [Relaychain] resolve hash: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 finalized: false tree_route: (common HashAndNumber { number: 11640015, hash: 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 }, last Some(HashAndNumber { number: 11640016, hash: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 })) best_block: 0xc839f31001312cfe044cc647a972a2ac9592d785bd74344a62a74e6851614f48 finalized_block:0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
txpool: [Relaychain] Starting pruning of block 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 (extrinsics: 2)
txpool: [Relaychain] Pruning at 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
txpool: [Relaychain] [0xa9912077a5f3d78ba7cba943bb3ee6cf3ff699b51efe707055e5fe9c86b38b13] Pruned at 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
txpool: [Relaychain] [0xf336fcf6fe0b77f437248f792662a9ef08ace02b999c193dfd4151be58350e4e] Pruned at 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
substrate: [Relaychain] ✨ Imported #11640016 (0x26c7…0b30)
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155474) para_slot=Slot(143577737) timestamp=Timestamp(1722932844000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
parachain::collation-generation: [Relaychain] collator returned no collation on collate para_id=2043
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0xd84896d961363cda754d331bb079735dd6a71467cf895c5e38fe54b72f702a4c
peerset: [Parachain] Report 12D3KooWKk4TataMLyBYkEGoZikga2xkQVvEfe6L3UFJvY2iS6ip: -2147483648 to -2147483648. Reason: Same block request multiple times. Banned, disconnecting.
txpool: [Relaychain] resolve hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b finalized: true tree_route: (common HashAndNumber { number: 11640014, hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b }, last Some(HashAndNumber { number: 11640014, hash: 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b })) best_block: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 finalized_block:0xc65cf4b0e55f0cf2d014759b81e652c698770ad682881baff6f75c5c673377a5
txpool: [Relaychain] [0x4e801750402d39b341c450691875c36cae12d5c0ba9a5438bd46dd5231f64e7e] Sent finalization event (block 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b)
txpool: [Relaychain] [0xeb447a276f71fa85f0fe3f4d3936f76f7c1ffcbd4f0af5dc8a01c42da97d7a80] Sent finalization event (block 0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b)
parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=2 failed_to_resolve=0
parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=2 removed=0
substrate: [Relaychain] 💤 Idle (9 peers), best: #11640016 (0x26c7…0b30), finalized #11640014 (0x2394…b99b), ⬇ 42.9kiB/s ⬆ 38.2kiB/s
substrate: [Parachain] 💤 Idle (9 peers), best: #4238138 (0x186e…165a), finalized #4238137 (0xdf70…ce3d), ⬇ 1.3MiB/s ⬆ 337.5kiB/s
txpool: [Relaychain] resolve hash: 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609 finalized: false tree_route: (common HashAndNumber { number: 11640016, hash: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 }, last Some(HashAndNumber { number: 11640017, hash: 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609 })) best_block: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 finalized_block:0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
txpool: [Relaychain] Starting pruning of block 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609 (extrinsics: 2)
txpool: [Relaychain] Pruning at 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
txpool: [Relaychain] [0xa42272d1b31d8e9b8b84e6e5a21f8b0c3f829c33db7d22ec17feb85177dc52fb] Pruned at 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
txpool: [Relaychain] [0xdb85b613996d6bd31678ed88a71fe01cc9b34a4bba81d0e4dd9f27ad39c2e362] Pruned at 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
substrate: [Relaychain] ✨ Imported #11640017 (0x8204…b609)
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(201804) relay_parent=0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
aura::cumulus: [Parachain] Adjusted relay-chain slot to parachain slot relay_slot=Slot(287155475) para_slot=Slot(143577737) timestamp=Timestamp(1722932850000) slot_duration=SlotDuration(12000) relay_chain_slot_duration=6s
parachain::collation-generation: [Relaychain] collator returned no collation on collate para_id=2043
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(4296) relay_parent=0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
parachain::prospective-parachains: [Relaychain] Failed to get inclusion backing state. para_id=Id(2000) relay_parent=0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609
parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b
parachain::collator-protocol: [Relaychain] Collation was requested. candidate_hash=0x9f397dd749697b65937c08728eb78251dc9903c0c8d926742b790a74ed1b6a74 pov_hash=0x7fc43e8d860d8a669fbead4e7e55e8098f2d19245fcba844b873ccbc9fb50288 traceID=211645764608713431645010785893780783697
sc_informant: [Relaychain] ♻️  Reorg on #11640017,0x8204…b609 to #11640017,0x09f4…763d, common ancestor #11640016,0x26c7…0b30
substrate: [Relaychain] ✨ Imported #11640017 (0x09f4…763d)
txpool: [Relaychain] resolve hash: 0x09f49fd958de2fbe1978ed9b8121106a78b36f8e11903accc67ffe3b8575763d finalized: false tree_route: (common HashAndNumber { number: 11640016, hash: 0x26c70a95630994a749cc0af62f3a938d5f34fff70f8a2ff4b93bf207e7620b30 }, last Some(HashAndNumber { number: 11640017, hash: 0x09f49fd958de2fbe1978ed9b8121106a78b36f8e11903accc67ffe3b8575763d })) best_block: 0x820492ae18adea2f3f1167464367a985592b6e2b7ea81f64a59d3e1e4713b609 finalized_block:0x2394e4d934986353e7dd6505d64881d3612c62cca3eee9bdbda8aaa0418fb99b

We assume that it is caused by forks on relay chain (Rococo) which are not handled properly by collators on NeuroWeb Testnet. But we would like to get confirmation and better understanding of this issue, so we are sure how to avoid having it on NeuroWeb Mainnet, alongside finding the way to unblock NeuroWeb Testnet.

Steps to reproduce

No response

skunert commented 3 months ago

Hi,

I investigated this issue a bit today. From the collator side everything seems to be okay. However, for some reason the submitted blocks fail to validate on the rococo validators.

I extracted this from one validators log:

2024-08-06 18:09:12.752  INFO tokio-runtime-worker parachain::candidate-validation: Failed to validate candidate para_id=Id(2043) error=Invalid(WorkerReportedInvalid("execute: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
0: 0x620fb6 - neuroweb_runtime.wasm!rust_begin_unwind
1: 0xc76a - neuroweb_runtime.wasm!core::panicking::panic_fmt::h8b4135fb81936008
2: 0x3466d3 - neuroweb_runtime.wasm!core::panicking::panic_display::h287c324f965c61f8
3: 0x346147 - neuroweb_runtime.wasm!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::apply_extrinsics::{{closure}}::panic_cold_display::h87e29365917e1814
4: 0x34410c - neuroweb_runtime.wasm!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h307560923d65cd10
5: 0x594ddd - neuroweb_runtime.wasm!<cumulus_pallet_aura_ext::BlockExecutor<T,I> as frame_support::traits::misc::ExecuteBlock<Block>>::execute_block::h17a9647f9efaccce
6: 0x1a0440 - neuroweb_runtime.wasm!environmental::local_key::LocalKey<T>::with::hce9d98b3bbf43917
7: 0x29dbcf - neuroweb_runtime.wasm!cumulus_pallet_parachain_system::validate_block::implementation::validate_block::h69a5a40d817377d2
8: 0x4a6723 - neuroweb_runtime.wasm!validate_block
9: 0x6d94dd - neuroweb_runtime.wasm!<wasm function 8475>"))

What the reason for this validation failure is exactly I am not sure. Maybe something went wrong with the runtime upgrade? When was it deployed?

Mihajlo-Pavlovic commented 3 months ago

Runtime was deployed on 2024-08-05 at 10:25:12 (+UTC), this is the block when the runtime upgrade is applied: https://neuroweb-testnet.subscan.io/block/4238101

You can see that 30 blocks were created after the runtime upgrade was applied: https://neuroweb-testnet.subscan.io/block/4238136, so the upgrade itself had no issue as block production continued after the upgrade.

We are considering changing runtime to the previous version using sudo on Rococo to unblock the network, do you think this would help us?

alexggh commented 3 months ago

This seems to be the reason the parachain block fails to validate:


2024-08-05 11:07:48.598 DEBUG tokio-runtime-worker parachain::pvf: 
starting execute for /chain-data/chains/rococo_v2_2/paritydb/full/pvf-artifacts/0x60d3f38c17a14de886f39484e258e3e277d4e2b9392a7c20f050dfe73678925eab1395170e9320bdc5bf7f51164c17244852168e1da4b722918b52505e366264.pvf 
worker_pid=95 worker_dir=WorkerDir { tempdir: TempDir { path: "/chain-data/chains/rococo_v2_2/paritydb/full/pvf-artifacts/worker-dir-execute-wY8kO1" } } 
validation_code_hash=0x102932e694d206e3f68f2b9e84e1f3a619f482965d22f4ae61ba98f9b065e2ae 2024-08-

05T11:07:48.660380Z ERROR runtime: panicked at 
/Users/nikolatodorovic/.cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/5641e18/substrate/frame/executive/src/lib.rs:693:17: 
InvalidTransaction custom error
--

Maybe it helps you figure it out why the blocks are invalid.

sandreim commented 3 months ago

Runtime was deployed on 2024-08-05 at 10:25:12 (+UTC), this is the block when the runtime upgrade is applied: https://neuroweb-testnet.subscan.io/block/4238101

You can see that 30 blocks were created after the runtime upgrade was applied: https://neuroweb-testnet.subscan.io/block/4238136, so the upgrade itself had no issue as block production continued after the upgrade.

As far as I can tell, everything was fine until https://neuroweb-testnet.subscan.io/block/4238137 which is appears unfinalized in subscan - latest para head on Rococo relay chain is of block https://neuroweb-testnet.subscan.io/block/4238136: 0xf1f71e7d498dbecd671c4a985199398f9d87fb380944184aa594251094b516cb

It seemed to have worked fine as long as all the para blocks were empty after the upgrade until 4238137 which contains a bunch of ethereum:transaction:TransactionV2 extrinsics: https://neuroweb-testnet.subscan.io/extrinsic/4238137-2

We are considering changing runtime to the previous version using sudo on Rococo to unblock the network, do you think this would help us?

Yeah setting the old code and head should work.

skunert commented 3 months ago

One more thing to try is to make sure that you are not using the native executor to build your blocks. Recently we had reports of nodes stalling because there was some encoding difference between wasm and native: https://github.com/paritytech/polkadot-sdk/issues/4808

The solution in that case was to switch to full wasm execution, which you should also do.

If that does not help, next step would be to reproduce this error locally. Recently there was mechanisms introduced to write PoVs to the disk and then execute them locally. It has two parts:

A command line flag to write all produced povs to a specified folder
A binary to feed the povs to the validation function. Since validators are failing to back your blocks, this should fail locally too. This was the PR: https://github.com/paritytech/polkadot-sdk/pull/4640

The debugging flow would be the following:

You introduce a similar CLI flag to your node as in the PR, that writes the produced PoVs to a directory.
You compile the pov-execution binary by running cargo install --path ./cumulus/bin/pov-validator --locked in the polkadot-sdk
You take one of the povs that your collators produced and run it against pov-validator with your runtime.
If it succeeds, there is some difference between the validation code on chain and your collators runtime. If it fails, you can now debug the runtime by changing it and seeing if it affects the outcome.

SBalaguer commented 3 months ago

Thank you all for your swift replies!

@skunert to your point, I believe the testnet version is still using native executor. I can also point out to this PR on HydraDX that changed that as well.

Still, given that the fix will change a bit of time and the team is in need to swiftly get their parachain back to production, I'd suggest we go with the suggestion of setting up the old code that @sandreim said it should work. The questions that I have given that the chain managed to produce ~30 blocks with the new code, are:

Should we set the head of the chain to the point before the upgrade happened? Upgrade happened at block #4238062.
Is there any prunning needed on the collators? Meaning getting them back to block #4238062? (or probably the block before that didn't have the upgrade).

skunert commented 3 months ago

@SBalaguer You should be able to just pass --execution=wasm should achieve the same for now, without any changes.

skunert commented 3 months ago

Ah missed your questions the first time. I have only limited experience in resetting chains. But from my mental model:

Yes you should reset to before the runtime upgrade.
Default pruning settings keep all finalized blocks around and state for 256 blocks, so the collators should be able to build again.

NZT48 commented 3 months ago

@SBalaguer You should be able to just pass --execution=wasm should achieve the same for now, without any changes.

When setting the parameter we get CLI parameter --execution has no effect anymore and will be removed in the future, and when trying to use older binary we cannot start the node because of the following error:

2024-08-07 12:36:04 [Parachain] Cannot create a runtime error=Other("runtime requires function imports which are not present on the host: 'env:ext_storage_proof_size_storage_proof_size_version_1'")
2024-08-07 12:36:04 [Parachain] Essential task `transaction-pool-task-0` failed. Shutting down service.
2024-08-07 12:36:04 [Parachain] Essential task `transaction-pool-task-1` failed. Shutting down service.
2024-08-07 12:36:04 [Parachain] Essential task `txpool-background` failed. Shutting down service.
Error: Service(Client(RuntimeApiError(Application(VersionInvalid("Other error happened while constructing the runtime: runtime requires function imports which are not present on the host: 'env:ext_storage_proof_size_storage_proof_size_version_1'")))))

So I guess the only option is to change the code.

bkchr commented 3 months ago

Should we set the head of the chain to the point before the upgrade happened? Upgrade happened at block #4238062.

When you do this, all the nodes will need to resync because the blocks after the runtime got upgraded are already finalized. We don't support to revert the finalized chain.

Without any more details, I assume as well that there is a mismatch between the native and the runtime registered on the relay chain. Just remove the native executor to test this theory. It should be the fastest option. If that doesn't work, you can still reset.

skunert commented 3 months ago

There should also be a .disable_use_native() method on NativeExecutor. But yeah best is to not use it at all.

NZT48 commented 3 months ago

Just to confirm, we should remove the native executor in a same way as it is done on the HydraDX PR. And just change the binaries to run the new ones and check if that resolves the issue.

skunert commented 3 months ago

Just to confirm, we should remove the native executor in a same way as it is done on the HydraDX PR. And just change the binaries to run the new ones and check if that resolves the issue.

That is correct.

After you switched to the new binaries, the collators should start building blocks again. This time however you used WASM to build so there should be no difference to the validation code the validators are running. Blocks should be backed by the relay chain again and you chain will progress.

NZT48 commented 3 months ago

Thanks for the help guys, this was indeed the solution and collators are now producing blocks on NeuroWeb Testnet!

skunert commented 3 months ago

That is great to hear! Will close here, if they get stuck again (which I don't expect) feel free to reopen.

paritytech / polkadot-sdk