zigbee-alliance / distributed-compliance-ledger

DCL is a public permissioned ledger framework for certification of device models. The ledger is based on Cosmos SDK and CometBFT (Tendermint).
Apache License 2.0
89 stars 44 forks source link

DCL panic and crash with invalid block in MainNet #503

Closed j2fong closed 1 year ago

j2fong commented 1 year ago

When launching a fresh new node on MainNet, DCL crashes at block height 46415 with the following error:

Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: 6:39PM INF committed state app_hash=863536F4B5AC194020C5A568E9CB1C5241B418E2753823E40088B8A7AF74FD0B height=46415 module=state num_txs=1 Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: 6:39PM INF indexed block height=46415 module=txindex Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: panic: Failed to process committed block (46416:0781D5F18FCCC2D3160ADFE1068C0E090EA25541256A07B7CDD62DAD934630B4): wrong Block.Header.AppHash. Expected 863536F4B5AC194020C5A568E9CB1C5241B418E2753823E40088B8A7AF74FD0B, got 480D25B8475DA40AC0E318A5D3F0A64A0D4C0BA92E3F281C8BB37863375758AB Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: goroutine 62 [running]: Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc0000e1c00, 0x0) Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: github.com/tendermint/tendermint@v0.34.14/blockchain/v0/reactor.go:401 +0x123a Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart Aug 08 18:39:33 ip-10-245-104-142 cosmovisor[14980]: github.com/tendermint/tendermint@v0.34.14/blockchain/v0/reactor.go:110 +0x7a Aug 08 18:39:34 ip-10-245-104-142 cosmovisor[14975]: 6:39PM ERR error="exit status 2" module=cosmovisor Aug 08 18:39:34 ip-10-245-104-142 systemd[1]: cosmovisor.service: Main process exited, code=exited, status=1/FAILURE Aug 08 18:39:34 ip-10-245-104-142 systemd[1]: cosmovisor.service: Failed with result 'exit-code'. Aug 08 18:39:35 ip-10-245-104-142 systemd[1]: cosmovisor.service: Scheduled restart job, restart counter is at 1.

I was able to reproduce this twice.

ashcherbakov commented 1 year ago

https://on.dcl.csa-iot.org:26657/block?height=46415

....
"txs": [
          "CpQGCpEGCkkvemlnYmVlYWxsaWFuY2UuZGlzdHJpYnV0ZWRjb21wbGlhbmNlbGVkZ2VyLnBraS5Nc2dQcm9wb3NlQWRkWDUwOVJvb3RDZXJ0EsMFCi1jb3Ntb3MxOG5xcDQ4Y2U1azdqOHF2dXdrN3Ruem1ja3dzZDhrMnRsbmo4NTASiwUtLS0tLUJFR0lOIENFUlRJRklDQVRFLS0tLS0KTUlJQnRUQ0NBVnFnQXdJQkFnSUlEUFFYTEx3VmhsUXdDZ1lJS29aSXpqMEVBd0l3S3pFVE1CRUdBMVVFQXd3SwpSRlZESUZCQlFTQkRUakVVTUJJR0Npc0dBUVFCZ3FKOEFnRU1CREUwTVVZd0lCY05Nak13TlRFd01UUXpNREF3CldoZ1BPVGs1T1RFeU16RXlNelU1TlRsYU1Dc3hFekFSQmdOVkJBTU1Da1JWUXlCUVFVRWdRMDR4RkRBU0Jnb3IKQmdFRUFZS2lmQUlCREFReE5ERkdNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVjeEdxeExMVwpkakNuTWNwWDJzZW5UTU5RY0lXdVVKNnUwN0p2WU84L1N0L05KTW12bThrOWF2VE5mSkwwVmU5dm1lbWR5U21rCm1ySDQ5MWZuNkNGSEFLTm1NR1F3RWdZRFZSMFRBUUgvQkFnd0JnRUIvd0lCQVRBT0JnTlZIUThCQWY4RUJBTUMKQVFZd0hRWURWUjBPQkJZRUZFTmcrU2NsWlkyYWVIbkNTTFQrdzg1NnpXZUpNQjhHQTFVZEl3UVlNQmFBRkVOZworU2NsWlkyYWVIbkNTTFQrdzg1NnpXZUpNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUNJUURMYWswdkU4WUU4OWtmCkQ4SUxYMlZVQTFFdlVnZG11T0RwMWVZbjM0Qk83d0loQUpXM0xQekowWFZOWnNPb0xveVIwZDllQTl3SVIxa3cKZ001L0dzc09iYW0xCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0gnLvtogYSWApQCkYKHy9jb3Ntb3MuY3J5cHRvLnNlY3AyNTZrMS5QdWJLZXkSIwohA07CCLJD1CBcIW9QfUcMu1qY2Wq4YEQkV7YDpwHcEQFnEgQKAggBGAwSBBDAmgwaQOfcNvBqGDIoTJ3oAkj3EdeM7jT4LOwX8wGervzvw1zSZr0ToQgIhePGuygGW+elZhFpmzAuEvn7+q6eVuTAREo="
        ]
      },
....
dcld tx decode CpQGCpEGCkkvemlnYmVlYWxsaWFuY2UuZGlzdHJpYnV0ZWRjb21wbGlhbmNlbGVkZ2VyLnBraS5Nc2dQcm9wb3NlQWRkWDUwOVJvb3RDZXJ0EsMFCi1jb3Ntb3MxOG5xcDQ4Y2U1azdqOHF2dXdrN3Ruem1ja3dzZDhrMnRsbmo4NTASiwUtLS0tLUJFR0lOIENFUlRJRklDQVRFLS0tLS0KTUlJQnRUQ0NBVnFnQXdJQkFnSUlEUFFYTEx3VmhsUXdDZ1lJS29aSXpqMEVBd0l3S3pFVE1CRUdBMVVFQXd3SwpSRlZESUZCQlFTQkRUakVVTUJJR0Npc0dBUVFCZ3FKOEFnRU1CREUwTVVZd0lCY05Nak13TlRFd01UUXpNREF3CldoZ1BPVGs1T1RFeU16RXlNelU1TlRsYU1Dc3hFekFSQmdOVkJBTU1Da1JWUXlCUVFVRWdRMDR4RkRBU0Jnb3IKQmdFRUFZS2lmQUlCREFReE5ERkdNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVjeEdxeExMVwpkakNuTWNwWDJzZW5UTU5RY0lXdVVKNnUwN0p2WU84L1N0L05KTW12bThrOWF2VE5mSkwwVmU5dm1lbWR5U21rCm1ySDQ5MWZuNkNGSEFLTm1NR1F3RWdZRFZSMFRBUUgvQkFnd0JnRUIvd0lCQVRBT0JnTlZIUThCQWY4RUJBTUMKQVFZd0hRWURWUjBPQkJZRUZFTmcrU2NsWlkyYWVIbkNTTFQrdzg1NnpXZUpNQjhHQTFVZEl3UVlNQmFBRkVOZworU2NsWlkyYWVIbkNTTFQrdzg1NnpXZUpNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUNJUURMYWswdkU4WUU4OWtmCkQ4SUxYMlZVQTFFdlVnZG11T0RwMWVZbjM0Qk83d0loQUpXM0xQekowWFZOWnNPb0xveVIwZDllQTl3SVIxa3cKZ001L0dzc09iYW0xCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0gnLvtogYSWApQCkYKHy9jb3Ntb3MuY3J5cHRvLnNlY3AyNTZrMS5QdWJLZXkSIwohA07CCLJD1CBcIW9QfUcMu1qY2Wq4YEQkV7YDpwHcEQFnEgQKAggBGAwSBBDAmgwaQOfcNvBqGDIoTJ3oAkj3EdeM7jT4LOwX8wGervzvw1zSZr0ToQgIhePGuygGW+elZhFpmzAuEvn7+q6eVuTAREo=
{"body":{"messages":[{"@type":"/zigbeealliance.distributedcomplianceledger.pki.MsgProposeAddX509RootCert","signer":"cosmos18nqp48ce5k7j8qvuwk7tnzmckwsd8k2tlnj850","cert":"-----BEGIN CERTIFICATE-----\nMIIBtTCCAVqgAwIBAgIIDPQXLLwVhlQwCgYIKoZIzj0EAwIwKzETMBEGA1UEAwwK\nRFVDIFBBQSBDTjEUMBIGCisGAQQBgqJ8AgEMBDE0MUYwIBcNMjMwNTEwMTQzMDAw\nWhgPOTk5OTEyMzEyMzU5NTlaMCsxEzARBgNVBAMMCkRVQyBQQUEgQ04xFDASBgor\nBgEEAYKifAIBDAQxNDFGMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEcxGqxLLW\ndjCnMcpX2senTMNQcIWuUJ6u07JvYO8/St/NJMmvm8k9avTNfJL0Ve9vmemdySmk\nmrH491fn6CFHAKNmMGQwEgYDVR0TAQH/BAgwBgEB/wIBATAOBgNVHQ8BAf8EBAMC\nAQYwHQYDVR0OBBYEFENg+SclZY2aeHnCSLT+w856zWeJMB8GA1UdIwQYMBaAFENg\n+SclZY2aeHnCSLT+w856zWeJMAoGCCqGSM49BAMCA0kAMEYCIQDLak0vE8YE89kf\nD8ILX2VUA1EvUgdmuODp1eYn34BO7wIhAJW3LPzJ0XVNZsOoLoyR0d9eA9wIR1kw\ngM5/GssObam1\n-----END CERTIFICATE-----","info":"","time":"1683709340"],"memo":"","timeout_height":"0","extension_options":[],"non_critical_extension_options":[]},"auth_info":{"signer_infos":[{"public_key":{"@type":"/cosmos.crypto.secp256k1.PubKey","key":"A07CCLJD1CBcIW9QfUcMu1qY2Wq4YEQkV7YDpwHcEQFn"},"mode_info":{"single":{"mode":"SIGN_MODE_DIRECT"}},"sequence":"12"}],"fee":{"amount":[],"gas_limit":"200000","payer":"","granter":""}},"signatures":["59w28GoYMihMnegCSPcR14zuNPgs7BfzAZ6u/O/DXNJmvROhCAiF48a7KAZb56VmEWmbMC4S+fv6rp5W5MBESg=="]}

So, it was MsgProposeAddX509RootCert sent by a Vendor (not a Trustee). Expected result - error about a wrong role. The transaction is really not added to DCL. It needs to be investigated why catch-up from beginning leads to a wrong AppHash.

It seems some non-deterministic (bug?) occurred which led to a different AppHash. A fresh node re-runs (re-applies) all transactions the same way as new transactions , and it’s expected that it must come to the same result (AppHash) for every block.

ashcherbakov commented 1 year ago

https://on.dcl.csa-iot.org:26657/block_results?height=46415 The transaction was invalid (invalid cert). We need to check if it can lead to an issue (how Tendermint handles invalid transactions and how they affect the AppHash).

ashcherbakov commented 1 year ago