vegaprotocol / vega

A Go implementation of the Vega Protocol, a protocol for creating and trading derivatives on a fully decentralised network.
https://vega.xyz
GNU Affero General Public License v3.0
37 stars 19 forks source link

[Bug]Datanode fails inbetween the full fun system-test run #9011

Closed VanithaVega closed 11 months ago

VanithaVega commented 1 year ago

Build - https://jenkins.ops.vega.xyz/job/common/job/system-tests-wrapper/90083/

While running full system-test , tests stated to fail inbetween due to datanode dropped and panic thrown

AssertionError: Did not reach epoch CHECK FOR CONSENSUS PANIC!!!!!!!!!!!!!!!!!!!!! Expected epoch: 247.

All the logs shows below block

modules/govMod/governMod.py:592: AssertionError

Panic Logs - (https://jenkins.ops.vega.xyz/job/common/job/system-tests-wrapper/90083/artifact/testnet/logs/testnet-nodeset-validators-3-validator/vega-validator-3.stderr-2023-08-08T00%3A14%3A48Z.log)

panic: leveldb: closed

goroutine 1294 [running]:
github.com/tendermint/tendermint/store.(*BlockStore).LoadSeenCommit(0xc001720b00, 0x19d4af5?)
    /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/cometbft@v0.34.28-0.20230322133204-3d8588de736e/store/store.go:230 +0x18a
github.com/tendermint/tendermint/consensus.(*State).LoadCommit(0xc0019a7180, 0x2c9)
    /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/cometbft@v0.34.28-0.20230322133204-3d8588de736e/consensus/state.go:291 +0xc5
github.com/tendermint/tendermint/consensus.(*Reactor).queryMaj23Routine(0xc0019d34d0, {0x58386f0, 0xc00412e000}, 0xc00412e0d0)
    /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/cometbft@v0.34.28-0.20230322133204-3d8588de736e/consensus/reactor.go:930 +0x8ff
created by github.com/tendermint/tendermint/consensus.(*Reactor).AddPeer
    /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/cometbft@v0.34.28-0.20230322133204-3d8588de736e/consensus/reactor.go:201 +0x205
guoguojin commented 1 year ago

This doesn't look like a DN failure, it's not receiving any blocks from core so it shut down. The problem appears to be that something failed in core that stopped core from producing blocks. The panic mentioned above comes from a core node log.

Sohill-Patel commented 1 year ago

recent example of this issue

https://jenkins.ops.vega.xyz/job/common/job/system-tests-nightly/1051/testReport/junit/tests.datanode/datanode_test/Call_tests___full_a_f___test_datanode_start_stop_restart_from_network_history/