vegaprotocol / data-node

A rich API server for Vega Protocol
https://vega.xyz
Other
3 stars 1 forks source link

Panic in system tests run #682

Closed MuthuVega closed 2 years ago

MuthuVega commented 2 years ago

Problem encountered

Found panic in a couple of instances in the system tests smoke run

Observed behaviour

A small number of system tests smoke runs have failed in the past couple of days and upon investigation found a panic in the network logs.

Example run:

https://jenkins.ops.vega.xyz/job/common/job/system-tests/2614/

Expected behaviour

System test smoke runs should pass.

System response

Found panic in the logs:

Job: testnet-nodeset-full-2, Task: tendermint-full-2: I[2022-06-14|10:05:56.285] executed block                               module=consensus height=274 num_valid_txs=0 num_invalid_txs=0
Job: testnet-nodeset-full-2, Task: tendermint-full-2: I[2022-06-14|10:05:56.286] 
Job: testnet-nodeset-full-2, Task: tendermint-full-2: panic: Failed to process committed block (937:54EF0CAC16F652B46EE1EFA95C3BDA92ACF2EB872FAFC2BC998487408D941905): commit failed for application: read message: EOF
Job: testnet-nodeset-full-2, Task: tendermint-full-2: 
Job: testnet-nodeset-full-2, Task: tendermint-full-2: goroutine 9432345 [running]:
Job: testnet-nodeset-full-2, Task: tendermint-full-2: github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc001068700, 0x0)
Job: testnet-nodeset-full-2, Task: tendermint-full-2:   /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/tendermint@v0.34.20-0.20220413172356-3b2247787580/blockchain/v0/reactor.go:401 +0x123a
Job: testnet-nodeset-full-2, Task: tendermint-full-2: created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).OnStart
Job: testnet-nodeset-full-2, Task: tendermint-full-2:   /jenkins/GOPATH/pkg/mod/github.com/vegaprotocol/tendermint@v0.34.20-0.20220413172356-3b2247787580/blockchain/v0/reactor.go:110 +0x7a

Also found these errors in the network logs.

Job: testnet-nodeset-full-2, Task: data-node-2: 2022-06-14T10:05:57.737Z    ERROR   broker/sqlstore_broker.go:258   failed to process event id:"266-7"  block:"9F10BBE1EF9FF775F010F0738F504BFDFE9ABF981EB07F816EBDB26AB0B0130C"  type:BUS_EVENT_TYPE_CHECKPOINT  checkpoint:{hash:"71aa7c7bddc05ad7870ee0da235c20c0c292af028ee54fd9975b24c6f7e75299"  block_hash:"9F10BBE1EF9FF775F010F0738F504BFDFE9ABF981EB07F816EBDB26AB0B0130C"  block_height:266}  version:1  chain_id:"test-chain-h1dUGe" error:ERROR: duplicate key value violates unique constraint "checkpoints_pkey" (SQLSTATE 23505)
Job: testnet-nodeset-full-2, Task: data-node-2: error adding checkpoint
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/sqlsubscribers.(*Checkpoint).consume
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/sqlsubscribers/checkpoint.go:56
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/sqlsubscribers.(*Checkpoint).Push
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/sqlsubscribers/checkpoint.go:44
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).push
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/broker/sqlstore_broker.go:253
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).handleEvent
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/broker/sqlstore_broker.go:241
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).processBlock
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/broker/sqlstore_broker.go:200
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).Receive
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/broker/sqlstore_broker.go:93
Job: testnet-nodeset-full-2, Task: data-node-2: code.vegaprotocol.io/data-node/cmd/data-node/node.(*NodeCommand).runNode.func8
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/workspace/common/system-tests/data-node/cmd/data-node/node/node.go:356
Job: testnet-nodeset-full-2, Task: data-node-2: golang.org/x/sync/errgroup.(*Group).Go.func1
Job: testnet-nodeset-full-2, Task: data-node-2:     /jenkins/GOPATH/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
Job: testnet-nodeset-full-2, Task: data-node-2: runtime.goexit
Job: testnet-nodeset-full-2, Task: data-node-2:     /usr/local/go/src/runtime/asm_amd64.s:1571
Job: testnet-nodeset-full-2, Task: data-node-2: 2022-06-14T10:05:57.737Z    INFO    broker/socket_server.go:172 Closing socket server

Steps to reproduce

Unable to reproduce reliably , but below are some of the runs.

https://jenkins.ops.vega.xyz/job/common/job/system-tests/2614/ https://jenkins.ops.vega.xyz/job/common/job/system-tests/2561/

Evidence

Logs

Network logs can be downloaded from the below link:

https://jenkins.ops.vega.xyz/job/common/job/system-tests/2614/artifact/testnet/

MuthuVega commented 2 years ago

Another latest run fail with panic https://jenkins.ops.vega.xyz/job/common/job/system-tests/2618/

Network Logs: https://jenkins.ops.vega.xyz/job/common/job/system-tests/2618/artifact/testnet/