kaspanet / kaspad

Kaspad was the reference full node Kaspa implementation written in Go (golang), now rewritten in Rust: https://github.com/kaspanet/rusty-kaspa
ISC License
457 stars 231 forks source link

Kaspad crashes with "Fatal error in goroutine 'flow-HandleRelayInvs 150': direct parent ... is missing" message #1888

Open michaelsutton opened 2 years ago

michaelsutton commented 2 years ago

A few users reported this crash. The error usually happens to properly running and synced nodes, and disappears when rerunning kaspad after the crash.

Here are example logs:

2021-12-14 10:38:47.407 [CRT] PROT: Exiting: Fatal error in goroutine `flow-HandleRelayInvs 17648`: direct parent 29bc57e2ff8d8486e4e930a375c37dc786fcb04
289b2d18bcc894dc67075b0ea is missing: only block with prefilled information can have some missing parents
github.com/kaspanet/kaspad/domain/consensus/processes/blockvalidator.(*blockValidator).setParents
        /home/runner/work/kaspad/kaspad/domain/consensus/processes/blockvalidator/pruning_violation_proof_of_work_and_difficulty.go:83
github.com/kaspanet/kaspad/domain/consensus/processes/blockvalidator.(*blockValidator).ValidatePruningPointViolationAndProofOfWorkAndDifficulty
        /home/runner/work/kaspad/kaspad/domain/consensus/processes/blockvalidator/pruning_violation_proof_of_work_and_difficulty.go:35 

2021-12-13 00:29:35.806 [CRT] PROT: Exiting: Fatal error in goroutine flow-HandleRelayInvs 150: direct parent 8e0690f87b9f3c76947a93d68e868ff4417a6e966fa68844189e9e611768a6f2 is missing: only block with prefilled information can have some missing parents

2021-12-13 12:00:13.899 [CRT] PROT: Exiting: Fatal error in goroutine flow-HandleRelayInvs 162: direct parent e2dd0e20e26a06e0602e0526882d1f365e9d9134149c4a95d72b2aa410968877 is missing: only block with prefilled information can have some missing parents

michaelsutton commented 2 years ago

From a glance at the code it seems like an inconsistency of parent state between parentsManager and ghostdagDataStores at level 0

someone235 commented 2 years ago

Looks like the orphan check somehow missed the missing block and now this assertion panics

cbytensky commented 2 years ago

Faced this error. Log:

2021-12-21 22:55:08.434 [INF] PROT: Accepted block cd3f21aa6197c8b5dd0583f9b77d6349e9c187cc7c319ecc1118b9165bd0f9dd via relay
2021-12-21 22:55:09.741 [INF] PROT: Accepted block c425a6e0a1f0ca6113604dc36109aaafac5f5d97d1ea3af119ee9a28b2c093e8 via relay
2021-12-21 22:55:09.769 [INF] PROT: Accepted block 631ab743d3fd8b075ac6e615e4b9d274184d8456828ff10ac84ac1b1306defcd via relay
2021-12-21 22:55:09.890 [INF] PROT: Accepted block 872c0752318de49c6f0a4d1bc5cbc908034e63267c900c8a69812a662aa442be via relay
2021-12-21 22:55:10.990 [INF] PROT: Accepted block 09f6a9e9e51ef98526f0554eb16e3778161751f391433c01d0b8fefea041c531 via relay
2021-12-21 22:55:13.669 [INF] BDAG: Processed 14 blocks and 0 headers in the last 11.72s (14 transactions, 2021-12-21 22:55:12.715 +0200 IST)
2021-12-21 22:55:13.669 [INF] PROT: Accepted block 7fd98f02e377e79224198839d59fc730f17678a192ddc8df526907a3bda4549c via relay
2021-12-21 22:55:14.962 [INF] PROT: Accepted block fb85fec553769bd8e0639003bc6ca32b765b2525693e1a6a71614aba852efe0f via relay
2021-12-21 22:55:16.759 [INF] PROT: Accepted block aeaa2c1c11404b04c432ecea9a173cf5ec3645ee52facda66d44ecec9608c904 via relay
2021-12-21 22:55:17.333 [INF] CMGR: Connecting to 45.204.2.108:16111
2021-12-21 22:55:18.333 [INF] CMGR: Connecting to [2a00:a040:19a:1d2:495c:c976:4672:4c1b]:16111
2021-12-21 22:55:19.334 [INF] CMGR: Connecting to [2a02:908:1477:c960:dabb:c1ff:fe39:f46c]:16111
2021-12-21 22:55:19.462 [INF] PROT: Accepted block 7744dca7884f0bfadb259306c870051d0eab3e781020f68537dd19f5d64ef73c via relay
2021-12-21 22:55:20.335 [INF] CMGR: Connecting to 212.41.9.187:16111
2021-12-21 22:55:20.696 [INF] TXMP: P2P Connected to 212.41.9.187:16111
2021-12-21 22:55:21.242 [INF] PROT: Accepted block 420c09599366121cd65a77dc76f5996cabbf0f21b17bb59fb35ac384358a7761 via relay
2021-12-21 22:55:22.296 [INF] PROT: Accepted block b07d47bf52449dcfa5bf710a237af0992303f5af728983553fdcfa9cb8ca1d96 via relay
2021-12-21 22:55:24.108 [INF] BDAG: Processed 6 blocks and 0 headers in the last 10.44s (6 transactions, 2021-12-21 22:55:22.238 +0200 IST)
2021-12-21 22:55:24.108 [INF] PROT: Accepted block 960cd49bb72ae84c147494a95f9a7b85e89639a8d4b680e731a7640cc22fc6c0 via relay
2021-12-21 22:55:24.157 [INF] PROT: Accepted block cb097c5424cfc41b2fd548e74dec634e8854de166aed50957a68f5531f322774 via relay
2021-12-21 22:55:25.538 [INF] PROT: Accepted block f34dcfb6557a568330059ee0a3570100d013480911a24dee3a6260a4c89728e1 via relay
2021-12-21 22:55:25.924 [INF] PROT: Accepted block c1c9f70f88224adfda456124eeaac1b9e6ed161648dd565bb89b2a7b324ed992 via relay
2021-12-21 22:55:26.933 [INF] PROT: Accepted block 69dfdb6daae406b437fbc0112619dca7b6cb499929a2ff032d59d74fa1a51a3d via relay
2021-12-21 22:55:28.405 [INF] PROT: Accepted block c994b497382c68035bed774e03b03c54709434fdf4c8d95859ffb121422e21ff via relay
2021-12-21 22:55:30.715 [INF] PROT: Accepted block ad80b9c861ae801e8d05797e4c040ed3be07cafdbdcfe22f1e2c2474230e36a8 via relay
2021-12-21 22:55:30.739 [INF] PROT: Accepted block db4d92f5281365b389ed300f48214a061c592cb7b10e98d25c30a7affd54b8e4 via relay
2021-12-21 22:55:33.186 [CRT] PROT: Exiting: Fatal error in goroutine `flow-HandleRelayInvs 467`: direct parent 29bc57e2ff8d8486e4e930a375c37dc786fcb04289b2d18bcc894dc67075b0ea is missing: only block with prefilled information can have some missing parents
github.com/kaspanet/kaspad/domain/consensus/processes/blockvalidator.(*blockValidator).setParents
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/processes/blockvalidator/pruning_violation_proof_of_work_and_difficulty.go:83
github.com/kaspanet/kaspad/domain/consensus/processes/blockvalidator.(*blockValidator).ValidatePruningPointViolationAndProofOfWorkAndDifficulty
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/processes/blockvalidator/pruning_violation_proof_of_work_and_difficulty.go:35
github.com/kaspanet/kaspad/domain/consensus/processes/blockprocessor.(*blockProcessor).validateBlock
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/processes/blockprocessor/validate_block.go:48
github.com/kaspanet/kaspad/domain/consensus/processes/blockprocessor.(*blockProcessor).validateAndInsertBlock
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/processes/blockprocessor/validate_and_insert_block.go:83
github.com/kaspanet/kaspad/domain/consensus/processes/blockprocessor.(*blockProcessor).ValidateAndInsertBlock
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/processes/blockprocessor/blockprocessor.go:148
github.com/kaspanet/kaspad/domain/consensus.(*consensus).ValidateAndInsertBlock
    /daglabs/go/src/github.com/cbytensky/kaspad/domain/consensus/consensus.go:171
github.com/kaspanet/kaspad/app/protocol/flows/blockrelay.(*handleRelayInvsFlow).processBlock
    /daglabs/go/src/github.com/cbytensky/kaspad/app/protocol/flows/blockrelay/handle_relay_invs.go:249
github.com/kaspanet/kaspad/app/protocol/flows/blockrelay.(*handleRelayInvsFlow).start