Failed to validate a candidate: PrepareError panic in cranelift codegen

ancibanci commented 1 year ago

Hi, I am running a validator on Kusama and I got some weir error in last few weeks which were appearing occasionally (three times in last 3 weeks).

My configuration: AMD Ryzen 5 3600, 6c/12t. 3.60GHz, 480 GB NVME, 64 GB, 1 Gbps

Dec 08 23:28:48 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:48 Failed to validate candidate para\_id=Id(2102) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))
Dec 08 23:28:49 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:49 Failed to validate candidate para\_id=Id(2102) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))
Dec 08 23:28:49 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:49 �~\_~R� Idle (40 peers), best: #15675073 (0x34d6�~@�0c24), finalized #15675070 (0xe2c5�~@�99d4), �~G 2.4MiB/s �~F 2.3MiB/s
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: The application panicked (crashed).
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Message:  called `Option::unwrap()` on a `None` value
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Location: /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-codegen-0.88.0/src/isa/x64/encoding/rex.rs:478
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Backtrace omitted. Run with RUST\_BACKTRACE=1 environment variable to display it.
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Run with RUST\_BACKTRACE=full to include source snippets.
Dec 08 23:28:52 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:52 Failed to validate candidate para\_id=Id(2000) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))
Dec 08 23:28:52 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:52 Failed to validate candidate para\_id=Id(2000) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))
Dec 08 23:28:52 nd21b22 polkadot\[395722\]: 2022-12-08 23:28:52 Failed to validate candidate para\_id=Id(2000) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))

Dec 19 23:23:10 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:10 dropping (Stream 83ddf08c/299) because buffer is full
Dec 19 23:23:12 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:11 dropping (Stream 83ddf08c/447) because buffer is full
Dec 19 23:23:12 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:12 ✨ Imported #15820739 (0xa1b7…0fe4)
Dec 19 23:23:13 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:13 💤 Idle (40 peers), best: #15820739 (0xa1b7…0fe4), finalized #15820734 (0xd7cb…1206), ⬇>
Dec 19 23:23:14 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:14 Failed to validate candidate para\_id=Id(2106) error=InvalidCandidate(PrepareError("pani>
Dec 19 23:23:14 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:14 Detected invalid candidate as an approval checker. reason=ExecutionError("panic: called>
Dec 19 23:23:14 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:14 New dispute initiated for candidate. candidate\_hash=0xc982da0e27d94c9f260a6b727ac40e12b>
Dec 19 23:23:16 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:16 Dispute on candidate concluded with 'valid' result candidate\_hash=0xc982da0e27d94c9f260>
Dec 19 23:23:18 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:18 ✨ Imported #15820740 (0x6749…60ff)
Dec 19 23:23:18 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:18 💤 Idle (40 peers), best: #15820740 (0x6749…60ff), finalized #15820736 (0xed4a…7bc4), ⬇>
Dec 19 23:23:21 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:21 dropping (Stream e7025add/61) because buffer is full
Dec 19 23:23:23 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:23 💤 Idle (40 peers), best: #15820740 (0x6749…60ff), finalized #15820737 (0x4f55…a9b0), ⬇>
Dec 19 23:23:24 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:24 ✨ Imported #15820741 (0x0e9e…1376)
Dec 19 23:23:24 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:24 ✨ Imported #15820741 (0xc94c…dc65)
Dec 19 23:23:25 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:25 Failed to validate candidate para\_id=Id(2116) error=InternalError("prepare: did not mak>
Dec 19 23:23:25 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:25 Failed to validate candidate due to internal error err=ValidationFailed("prepare: did n>
Dec 19 23:23:26 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:26 Failed to validate candidate para\_id=Id(2106) error=InvalidCandidate(PrepareError("pani>
Dec 19 23:23:26 nd21b22 polkadot\[563554\]: 2022-12-19 23:23:26 Detected invalid candidate as an approval checker. reason=ExecutionError("panic: called>

Dec 21 15:31:11 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:11 New dispute initiated for candidate. candidate\_hash=0x2fae82eab6d611cdc928c0cb27bffcc6a>
Dec 21 15:31:12 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:12 💤 Idle (40 peers), best: #15844426 (0x1c60…ccd6), finalized #15844420 (0xa6cc…5bd2), ⬇>
Dec 21 15:31:12 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:12 ✨ Imported #15844427 (0x78d3…2729)
Dec 21 15:31:15 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:15 Dispute on candidate concluded with 'valid' result candidate\_hash=0x2fae82eab6d611cdc92>
Dec 21 15:31:17 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:17 💤 Idle (40 peers), best: #15844427 (0x78d3…2729), finalized #15844420 (0xa6cc…5bd2), ⬇>
Dec 21 15:31:18 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:18 ✨ Imported #15844428 (0x943d…6d98)
Dec 21 15:31:20 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:20 failed to recv an execute response worker\_pid=589013 validation\_code\_hash=0xf80995616e2>
Dec 21 15:31:22 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:22 💤 Idle (40 peers), best: #15844428 (0x943d…6d98), finalized #15844423 (0x6acb…9134), ⬇>
Dec 21 15:31:23 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:23 failed to recv an execute response worker\_pid=589111 validation\_code\_hash=0xf80995616e2>
Dec 21 15:31:23 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:23 Failed to validate candidate para\_id=Id(2090) error=InvalidCandidate(AmbiguousWorkerDea>
Dec 21 15:31:23 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:23 Detected invalid candidate as an approval checker. reason=ExecutionError("ambiguous wor>
Dec 21 15:31:23 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:23 New dispute initiated for candidate. candidate\_hash=0x4c296441a32c3186f231e41c2daeecb0a>
Dec 21 15:31:24 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:24 ✨ Imported #15844429 (0xed31…cecf)
Dec 21 15:31:27 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:27 💤 Idle (40 peers), best: #15844429 (0xed31…cecf), finalized #15844423 (0x6acb…9134), ⬇>
Dec 21 15:31:27 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:27 Dispute on candidate concluded with 'valid' result candidate\_hash=0x4c296441a32c3186f23>
Dec 21 15:31:30 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:30 ✨ Imported #15844430 (0xa635…ed66)
Dec 21 15:31:30 nd21b22 polkadot\[587310\]: 2022-12-21 15:31:30 ♻️  Reorg on #15844430,0xa635…ed66 to #15844430,0x4a12…26c9, common ancestor #15844429,0>

Dec 25 20:09:19 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:19 💤 Idle (40 peers), best: #15904467 (0xc2da…f408), finalized #15904463 (0xebb2…1abb), ⬇>
Dec 25 20:09:24 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:24 💤 Idle (40 peers), best: #15904467 (0xc2da…f408), finalized #15904464 (0x1e60…6e24), ⬇>
Dec 25 20:09:24 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:24 ✨ Imported #15904468 (0xd065…e424)
Dec 25 20:09:25 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:25 Failed to validate candidate para\_id=Id(2101) error=InvalidCandidate(PrepareError("pani>
Dec 25 20:09:25 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:25 Detected invalid candidate as an approval checker. reason=ExecutionError("panic: intern>
Dec 25 20:09:25 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:25 New dispute initiated for candidate. candidate\_hash=0x32e50296aa0550ffadd1077dff307fa31>
Dec 25 20:09:27 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:27 Dispute on candidate concluded with 'valid' result candidate\_hash=0x32e50296aa0550ffadd>
Dec 25 20:09:29 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:29 💤 Idle (40 peers), best: #15904468 (0xd065…e424), finalized #15904465 (0x8ad0…e372), ⬇>
Dec 25 20:09:30 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:30 ✨ Imported #15904469 (0x6a0c…0b4a)
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/57) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/59) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/61) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/113) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/115) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/117) because buffer is full
Dec 25 20:09:33 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:33 dropping (Stream 95f34c96/119) because buffer is full
Dec 25 20:09:34 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:34 💤 Idle (40 peers), best: #15904469 (0x6a0c…0b4a), finalized #15904466 (0xd15c…d6e5), ⬇>
Dec 25 20:09:36 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:36 ✨ Imported #15904470 (0xcdba…26a3)
Dec 25 20:09:37 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:37 Failed to validate candidate para\_id=Id(2101) error=InvalidCandidate(PrepareError("pani>
Dec 25 20:09:37 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:37 Detected invalid candidate as an approval checker. reason=ExecutionError("panic: intern>
Dec 25 20:09:37 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:37 New dispute initiated for candidate. candidate\_hash=0xb0b2c4eb5a325a6c9b589b580209b825b>
Dec 25 20:09:39 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:39 💤 Idle (40 peers), best: #15904470 (0xcdba…26a3), finalized #15904467 (0xc2da…f408), ⬇>
Dec 25 20:09:40 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:39 Dispute on candidate concluded with 'valid' result candidate\_hash=0xb0b2c4eb5a325a6c9b5>
Dec 25 20:09:42 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:42 ✨ Imported #15904471 (0x7877…c47b)
Dec 25 20:09:42 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:42 ✨ Imported #15904471 (0x7239…e7f0)
Dec 25 20:09:44 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:44 💤 Idle (40 peers), best: #15904471 (0x7877…c47b), finalized #15904467 (0xc2da…f408), ⬇>
Dec 25 20:09:48 nd21b22 polkadot\[884985\]: 2022-12-25 20:09:48 ✨ Imported #15904472 (0xae9d…64a6)

Then today I got different error which resulted in my node being chilled.

Jan 02 17:45:06 nd21b22 polkadot[1061658]: 2023-01-02 17:45:06 Locating closest peers for replication failed: Err(Timeout { key: Key(b"\x80xl\x83'\x84\xe7\xa2*\xebu\xd8=>J]\x02@\x7fW\n\xf1\xedD\xb4\x9c#\x0f\xfa\xa4c'"), success: [], quorum: 20 })
Jan 02 17:45:06 nd21b22 polkadot[1061658]: 2023-01-02 17:45:06 Locating closest peers for replication failed: Err(Timeout { key: Key(b"\x1e\x0ch\xdcf?{\xd0\x06\x15g3\xeb\xd77\x0c\t-\x84q\x97\xcaU\xdc'\xc0\x99\x9c\xc7J1l"), success: [], quorum: 20 })
...
Jan 02 17:45:06 nd21b22 polkadot[1061658]: 2023-01-02 17:45:06 Locating closest peers for replication failed: Err(Timeout { key: Key(b"sa\x8a\xf3\xd0\x19\xf0\xdam#\x92'\x80Ccg\x16\xad\xfa?\xe7\xe9\xbe\xaf\xb0(\xe8\xf0&I\x9c\x15"), success: [], quorum: 20 })
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ----------------------------
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 This chain is not in any way
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25       endorsed by the
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25      KUSAMA FOUNDATION
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ----------------------------
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 Parity Polkadot
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 âœŒï¸  version 0.9.36-dc25abc712e
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 â¤ï¸  by Parity Technologies <admin@parity.io>, 2017-2023
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ðŸ“‹ Chain specification: Kusama
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ðŸ·  Node name: Anci2
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ðŸ‘¤ Role: AUTHORITY
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 ðŸ’¾ Database: RocksDb at /home/admin/.local/share/polkadot/chains/ksmcc3/db/full
Jan 02 17:45:25 nd21b22 polkadot[1067078]: 2023-01-02 17:45:25 â›“  Native runtime: kusama-9360 (parity-kusama-0.tx18.au2)
Jan 02 17:45:39 nd21b22 polkadot[1067078]: 2023-01-02 17:45:39 ðŸ·  Local node identity is: 12D3KooWDbpjPCgB4D5284scTg14KEEuRFtXJsGqpYdNexfsejPe
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Operating system: linux
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» CPU architecture: x86_64
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Target environment: gnu
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» CPU: AMD Ryzen 5 3600 6-Core Processor
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» CPU cores: 6
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Memory: 64334MB
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Kernel: 5.8.0-43-generic
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Linux distribution: Ubuntu 20.04.2 LTS
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ’» Virtual machine: no
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ“¦ Highest known block at #16016895
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ã€½ï¸ Prometheus exporter started at 127.0.0.1:9615
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ CPU score: 928.34 MiBs
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ Memory score: 15.30 GiBs
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ Disk score (seq. writes): 776.77 MiBs
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ Disk score (rand. writes): 400.74 MiBs
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 ðŸ‘¶ Starting BABE Authorship worker
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Could not retrieve session info from rolling session window session=27142
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Could not retrieve session info from rolling session window session=27142
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Could not retrieve session info from rolling session window session=27142
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWEuMBvwt6cc7LqRuTDpb266BZBZo2bfZSnAoaeQKENJch /ip4/95.214.55.97/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWDgta6xMn3jrXbJNaNJ3ofVh78Y3YEt2Hr54iBkBoLpak /ip4/185.16.38.47/tcp/30334/ws
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWDW494ySCpF7xNaSSTx6E9saK31nNqKGCitN9ZkDFZhZ9 /ip4/95.214.55.71/tcp/31333/ws
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWEMV7nxxP1i3Gf35quPfy9vEUanfUDt8utkPirJ1Z2vMC /ip4/185.16.38.47/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWBtmgExyd7hfUfCr4JE1rARRRfzUgABJUSK8WwSMgKWG4 /ip4/95.214.55.137/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWCQQ1Qc2nCZpSnVXTmRJbfmmWhkTf4VzzXjBhihwwyPNb /ip4/95.214.55.151/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWM6u9WBRpJNcFzPPndzssAnmYTWqZFd9zU5rFUtSCBb5A /ip4/185.16.38.48/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWAH7s9HmR3sJdoiHjtYAAz2PoRj89vdgGqCe9SKLmbiW1 /ip4/95.214.55.5/tcp/30334/ws
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWPuxNqihFcQAVDCYDDrrusBrP6GEqscCnTjgSgEUAD5Gz /ip4/95.214.55.5/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWHNEZG4FrGuLbKhZPZirf9wuxJQKTzCfoqZsNb1Yx2x4h /ip4/195.3.220.161/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWEoiWSZ1FyFqjnDkuxufVgiCxWrQVoLVVynZkDJUkJPcP /ip4/178.211.139.124/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooW9uvq4UMBsZ2Xkum9oBcHUxeoR1aG2CHRNJ8XeQsjtEV3 /ip4/195.3.220.225/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 discovered: 12D3KooWJps2k3DjG7Tm9XQzWyejVV8sKb7y2iierzx4emJ7Ssis /ip4/95.214.55.25/tcp/30333
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 â™»ï¸  Reorg on #16016895,0x85ddâ€¦edce to #16016896,0x37ecâ€¦8aa9, common ancestor #16016892,0x6216â€¦0e12
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 âœ¨ Imported #16016896 (0x37ecâ€¦8aa9)
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Advanced session window for approvals update=Advanced { prev_window_start: 27139, prev_window_end: 27139, new_window_start: 27139, new_window_end: 27142 }
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Candidate included without being backed? candidate_hash=0x924c3f9adb8fa0dbdbd68c278875a5ebe74991a8f012920ac02de54a8da96453 traceID=194463192006739274773063522753629758955
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Candidate included without being backed? candidate_hash=0x94758a81cba1a2f3404e824af99deb24a699cb03c4fe1c6f4afff323851993b6 traceID=197336051364595255854173390647198608164
Jan 02 17:45:44 nd21b22 polkadot[1067078]: 2023-01-02 17:45:44 Candidate included without being backed? candidate_hash=0x5159f682c91c282f3e685253e626e20d0a32f302cbbfe7cd0e58fa5606328f8c traceID=108134581913651937192628553669495153165
...
Jan 02 17:45:45 nd21b22 polkadot[1067078]: 2023-01-02 17:45:45 ðŸ” Discovered new external address for our node: /ip4/95.214.55.70/tcp/30333/p2p/12D3KooWDbpjPCgB4D5284scTg14KEEuRFtXJsGqpYdNexfsejPe
Jan 02 17:45:45 nd21b22 polkadot[1067078]: 2023-01-02 17:45:45 ðŸ” Discovered new external address for our node: /ip6/2a03:cfc0:8000:b::5fd6:3746/tcp/30333/p2p/12D3KooWDbpjPCgB4D5284scTg14KEEuRFtXJsGqpYdNexfsejPe
Jan 02 17:45:49 nd21b22 polkadot[1067078]: 2023-01-02 17:45:49 âš™ï¸  Preparing, target=#16017875 (27 peers), best: #16017054 (0x1c7dâ€¦1bea), finalized #16016896 (0x37ecâ€¦8aa9), â¬‡ 10.0MiB/s â¬† 496.9kiB/s
Jan 02 17:45:50 nd21b22 polkadot[1067078]: 2023-01-02 17:45:50 ðŸ” Discovered new external address for our node: /ip4/51.159.11.132/tcp/30333/p2p/12D3KooWDbpjPCgB4D5284scTg14KEEuRFtXJsGqpYdNexfsejPe
Jan 02 17:45:54 nd21b22 polkadot[1067078]: 2023-01-02 17:45:54 âš™ï¸  Preparing 36.6 bps, target=#16017875 (40 peers), best: #16017237 (0x7385â€¦2da4), finalized #16016896 (0x37ecâ€¦8aa9), â¬‡ 1.9MiB/s â¬† 400.1kiB/s
...
Jan 02 17:46:18 nd21b22 polkadot[1067078]: 2023-01-02 17:46:18 Candidate included without being backed? candidate_hash=0xfc0c315bb6c9b286bd92629e7d0875890a872784ac5b2d2831bb6c264f1b08b2 traceID=335028763604504694577192328272974214537
Jan 02 17:46:18 nd21b22 polkadot[1067078]: 2023-01-02 17:46:18 Candidate included without being backed? candidate_hash=0x5cc849caa6c35280ff1074131e1262bbfee63de07582dbc9793a3ca85842c02d traceID=123328931655519762064975779404725772987...
Jan 02 17:46:30 nd21b22 polkadot[1067078]: 2023-01-02 17:46:30 âœ¨ Imported #16017882 (0x4361â€¦e9bf)
Jan 02 17:46:34 nd21b22 polkadot[1067078]: 2023-01-02 17:46:34 ðŸ’¤ Idle (33 peers), best: #16017882 (0x4361â€¦e9bf), finalized #16017879 (0xc67dâ€¦9863), â¬‡ 1.3MiB/s â¬† 3.0MiB/s
Jan 02 17:46:36 nd21b22 polkadot[1067078]: 2023-01-02 17:46:36 âœ¨ Imported #16017883 (0x1d6câ€¦7ed8)
Jan 02 17:46:39 nd21b22 polkadot[1067078]: 2023-01-02 17:46:39 ðŸ’¤ Idle (19 peers), best: #16017883 (0x1d6câ€¦7ed8), finalized #16017880 (0xcde5â€¦7664), â¬‡ 960.7kiB/s â¬† 1.6MiB/s
...
Jan 02 18:06:12 nd21b22 polkadot[1067078]: 2023-01-02 18:06:12 âœ¨ Imported #16018078 (0x15b4â€¦8ddd)
Jan 02 18:06:14 nd21b22 polkadot[1067078]: 2023-01-02 18:06:14 ðŸ’¤ Idle (40 peers), best: #16018078 (0x15b4â€¦8ddd), finalized #16018075 (0x032aâ€¦312c), â¬‡ 1.7MiB/s â¬† 1.9MiB/s
Jan 02 18:06:18 nd21b22 polkadot[1067078]: 2023-01-02 18:06:18 âœ¨ Imported #16018079 (0x09bcâ€¦9a55)
Jan 02 18:06:19 nd21b22 polkadot[1067078]: 2023-01-02 18:06:19 ðŸ’¤ Idle (40 peers), best: #16018079 (0x09bcâ€¦9a55), finalized #16018075 (0x032aâ€¦312c), â¬‡ 1.5MiB/s â¬† 1.5MiB/s
Jan 02 18:06:24 nd21b22 polkadot[1067078]: 2023-01-02 18:06:24 âœ¨ Imported #16018080 (0x6795â€¦e159)
Jan 02 18:06:24 nd21b22 polkadot[1067078]: 2023-01-02 18:06:24 ðŸ’¤ Idle (40 peers), best: #16018080 (0x6795â€¦e159), finalized #16018077 (0x7701â€¦002f), â¬‡ 1.3MiB/s â¬† 1.7MiB/s

ordian commented 1 year ago

This validator was issuing quite a few disputes, we can see two reasons from the logs:

Failed to validate candidate para\_id=Id(2000) error=InvalidCandidate(PrepareError("panic: called `Option::unwrap()` on a `None` value"))
Location: /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-codegen-0.88.0/src/isa/x64/encoding/rex.rs:478
Failed to validate candidate para\_id=Id(2090) error=InvalidCandidate(AmbiguousWorkerDeath)

eskimor commented 1 year ago

The first error is really weird: Prepare, means during artifact preparation. So how can there be an Option::unwrap() error, that is only happening on this validator? Do we have any clue what option that might be? @mrcnski ?

eskimor commented 1 year ago

ambiguous worker death is also kind of unexpected, with 64GiG of RAM, OOM seems an unlikely reason.

ordian commented 1 year ago

Do we have any clue what option that might be?

I assume the backtrace points to that unwrap:

Dec 08 23:28:52 nd21b22 polkadot\[397866\]: The application panicked (crashed).
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Message:  called `Option::unwrap()` on a `None` value
Dec 08 23:28:52 nd21b22 polkadot\[397866\]: Location: /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/cranelift-codegen-0.88.0/src/isa/x64/encoding/rex.rs:478

Following the links:

mrcnski commented 1 year ago

The first error is really weird: Prepare, means during artifact preparation. So how can there be an Option::unwrap() error, that is only happening on this validator? Do we have any clue what option that might be? @mrcnski ?

I saw this but didn't have any ideas, I'd have to sit down and do a deep dive.

Edit: it's a panic error, which most likely happened during the compilation itself.

ambiguous worker death is also kind of unexpected, with 64GiG of RAM, OOM seems an unlikely reason.

Is the "retry on AWD" change in this version? I noticed in the code that we don't log when we retry, but that would be nice to have.

ancibanci commented 1 year ago

Do you want me to perform some tests maybe? Should I rebuild my validator or just restart and monitor?

vstakhov commented 1 year ago

Do you use ECC memory in your setup?

eskimor commented 1 year ago

reg.to_real_reg() ... some configuration mismatch in cranelift for that particular processor? In general, from my perspective this is most likely either a cranelift bug or some weird hardware fault. The former being more realistic.

@ancibanci what version of Polkadot are you running? And, yes rebuilding or using a differently provided binary and restarting might shed some more light.

ordian commented 1 year ago

From the logs: version 0.9.36-dc25abc712e.

From the specs it looks like https://www.hetzner.com/dedicated-rootserver/ax41-nvme/ probably with Non-ECC RAM, is that correct?

From the previous reports it does feel like either corrupted storage or memory, but I'm hesitant to say that just because I don't have any other explanation.

ancibanci commented 1 year ago

@vstakhov No, I don't use ECC memory. @eskimor version is 0.9.36 @ordian The server is not on Hetzner, but on Mevspace, datacenter located in Warsaw.

I will try to rebuild and see what will happen, and I will write an update tomorrow.

Thank you!

vstakhov commented 1 year ago

Using non-ECC memory to run server software is a risky choice, as errors are likely to occur eventually due to the inherent nature of non-ECC memory.

ancibanci commented 1 year ago

After downloading binary and restarting, I started getting again the errors "buffer is full"

Jan 04 18:46:25 nd21b22 polkadot[1187609]: 2023-01-04 18:46:25 💤 Idle (40 peers), best: #16047168 (0xb747…6257), finalized #16047165 (0x769d…0f01), ⬇ 2.2MiB/s ⬆ 2.0M>
Jan 04 18:46:30 nd21b22 polkadot[1187609]: 2023-01-04 18:46:30 ✨ Imported #16047169 (0xd50c…7bfa)
Jan 04 18:46:30 nd21b22 polkadot[1187609]: 2023-01-04 18:46:30 ♻️  Reorg on #16047169,0xd50c…7bfa to #16047169,0xcb01…d34c, common ancestor #16047168,0xb747…6257
Jan 04 18:46:30 nd21b22 polkadot[1187609]: 2023-01-04 18:46:30 ✨ Imported #16047169 (0xcb01…d34c)
Jan 04 18:46:30 nd21b22 polkadot[1187609]: 2023-01-04 18:46:30 💤 Idle (40 peers), best: #16047169 (0xcb01…d34c), finalized #16047165 (0x769d…0f01), ⬇ 1.6MiB/s ⬆ 1.4M>
Jan 04 18:46:34 nd21b22 polkadot[1187609]: 2023-01-04 18:46:34 dropping (Stream 643314fa/67) because buffer is full
Jan 04 18:46:34 nd21b22 polkadot[1187609]: 2023-01-04 18:46:34 dropping (Stream 643314fa/69) because buffer is full
Jan 04 18:46:34 nd21b22 polkadot[1187609]: 2023-01-04 18:46:34 dropping (Stream 643314fa/71) because buffer is full
Jan 04 18:46:34 nd21b22 polkadot[1187609]: 2023-01-04 18:46:34 dropping (Stream 643314fa/73) because buffer is full

7 hours ago I got "bad assignment from peer"

Jan 05 05:11:40 nd21b22 polkadot[1221292]: 2023-01-05 05:11:40 💤 Idle (40 peers), best: #16053402 (0xaae7…2024), finalized #16053398 (0x84a4…5a6a), ⬇ 2.0MiB/s ⬆ 1.2M>
Jan 05 05:11:42 nd21b22 polkadot[1221292]: 2023-01-05 05:11:42 ✨ Imported #16053403 (0x97ca…5806)
Jan 05 05:11:45 nd21b22 polkadot[1221292]: 2023-01-05 05:11:45 💤 Idle (40 peers), best: #16053403 (0x97ca…5806), finalized #16053400 (0xbdd7…01ec), ⬇ 2.0MiB/s ⬆ 1.1M>
Jan 05 05:11:45 nd21b22 polkadot[1221292]: 2023-01-05 05:11:45 Got a bad assignment from peer hash=0x443636cf57f02b647a77718472b34d2ddf68ac0636322f97455d877fb6c2f26e >
Jan 05 05:11:45 nd21b22 polkadot[1221292]: 2023-01-05 05:11:45 Chain between (0x948c…be16, 16053401) and 16053398 not fully known. Forcing vote on 16053398 unknown_nu>
Jan 05 05:11:45 nd21b22 polkadot[1221292]: 2023-01-05 05:11:45 Got a bad assignment from peer hash=0xc0a46dd9573bfba881d1b11b19b4b2abf20616ac599e429c63e171a7b36c03a3 >
Jan 05 05:11:45 nd21b22 polkadot[1221292]: 2023-01-05 05:11:45 Got a bad assignment from peer hash=0xc0a46dd9573bfba881d1b11b19b4b2abf20616ac599e429c63e171a7b36c03a3 >

And my node got stuck at the block 16055683, and couldn't produce any blocks. It was again chilled very soon after that.

Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 ♻️  Reorg on #16055683,0x6d7b…8a7f to #16055684,0x01a1…001d, common ancestor #16055682,0x255a…0396
Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 ✨ Imported #16055684 (0x01a1…001d)
Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 Candidate included without being backed? candidate_hash=0xe3a77990cde0032b0ca70dcfc10c62c87159809550e18>
Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 Candidate included without being backed? candidate_hash=0x1556ed7038a7502d69107ccb5f36217ead41fd6185180>
Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 Candidate included without being backed? candidate_hash=0xd7f1b4db5731f8d216f8023dbd3b074d748d0356a0063>
Jan 05 12:09:11 nd21b22 polkadot[1230308]: 2023-01-05 12:09:11 Candidate included without being backed? candidate_hash=0xb23583dcdcf82a651c375a44eb2362f37ab1745542742>

eskimor commented 1 year ago

The logs themselves tell very little unfortunately. None of the logs after the restart should result in chilling. Could it be that the validator was already chilled?

paritytech / polkadot-sdk

Failed to validate a candidate: PrepareError panic in cranelift codegen #743