Panic on data-node n01.stagnet2.vega.xyz on Stagnet 2 network.
Observed behaviour
The Stagnet 2 has been up and running for around 7 days: 5 validators (vega + tendermint) and 5 non-validators (vega + tendermint + data-node).
Panic has occurred on one data-node only. All the other services: vega, tendermint, data-node on all servers are up and running (no panic).
This node has been queried by https://stats.vega.trading/, and probably by some Frontend services.
Expected behaviour
No panic
System response
data-node went down
Steps to reproduce
Manual
Steps to reproduce the behaviour manually:
No steps
Automation
-
Evidence
Logs
2022-05-26T15:45:44.582Z INFO cfgwatcher config/watcher.go:78 config watcher started successfully {"config": "/home/vega/.config/vega/data-node/config.toml"}
2022-05-26T15:45:44.582Z INFO node/node_pre.go:64 vega is starting with pprof profile, this is not a recommended setting for production
2022-05-26T15:45:44.582Z INFO node/node_pre.go:74 Starting Vega {"version": "v0.51.1", "version-hash": "8db12615"}
2022-05-26T15:45:44.582Z DEBUG node/node_pre.go:86 Set ulimits {"nofile": 8192}
2022-05-26T15:45:44.646Z INFO node/node.go:341 Vega data node startup complete
2022-05-26T15:45:44.646Z INFO api.grpc api/server.go:338 Starting gRPC based API {"v1 API using sql stores": false, "addr": "0.0.0.0", "port": "3007"}
2022-05-26T15:45:44.646Z INFO broker/socket_server.go:69 Starting broker socket server {"addr": "0.0.0.0", "port": 3005}
2022-05-26T15:45:44.646Z INFO gateway.gql graphql/server.go:115 Starting GraphQL based API {"addr": "0.0.0.0", "port": 3008}
2022-05-26T15:45:44.646Z WARN gateway.gql graphql/server.go:140 graphql playground enabled, this is not a recommended setting for production
2022-05-26T15:45:44.646Z WARN gateway.gql graphql/server.go:188 GraphQL server is not configured to use HTTPS, which is required for subscriptions to work. Please see README.md for help configuring
2022-05-26T15:45:44.646Z INFO gateway.restproxy rest/server.go:65 Starting REST<>GRPC based API {"addr": "0.0.0.0", "port": 3009}
2022-05-26T15:45:50.521Z INFO broker/socket_server.go:72 New broker connection event {"eventType": "Attaching", "id": 450014804, "address": "tcp://[::]:3005"}
2022-05-26T15:45:50.521Z INFO broker/socket_server.go:72 New broker connection event {"eventType": "Attached", "id": 450014804, "address": "tcp://[::]:3005"}
2022-05-26T15:45:58.604Z ERROR storage storage/nodes.go:147 Received node ranking for non existing node {"node_id": "d869d7de6fbaf01a46586ccaa412445129de481d076a0f3f15d94c6b95e3d8e4"}
2022-05-26T15:45:58.604Z INFO subscribers/nodes.go:156 ranking event received before node was added -- try again later {"nodeID": "d869d7de6fbaf01a46586ccaa412445129de481d076a0f3f15d94c6b95e3d8e4"}
2022-05-26T15:45:58.604Z ERROR storage storage/nodes.go:147 Received node ranking for non existing node {"node_id": "76528643a8442cd7dcf738a1ae56b47e7d50ddda921bd6a0d1dd2a97a64b389b"}
2022-05-26T15:45:58.604Z INFO subscribers/nodes.go:156 ranking event received before node was added -- try again later {"nodeID": "76528643a8442cd7dcf738a1ae56b47e7d50ddda921bd6a0d1dd2a97a64b389b"}
2022-05-26T15:45:58.604Z ERROR storage storage/nodes.go:147 Received node ranking for non existing node {"node_id": "60980b3c2c584ae45b1b244f2f7d262a1d814cf4ac308c4bb07e6df93da4bb9c"}
2022-05-26T15:45:58.604Z INFO subscribers/nodes.go:156 ranking event received before node was added -- try again later {"nodeID": "60980b3c2c584ae45b1b244f2f7d262a1d814cf4ac308c4bb07e6df93da4bb9c"}
2022-05-26T15:45:58.604Z ERROR storage storage/nodes.go:147 Received node ranking for non existing node {"node_id": "87bd077d8ec957c69dfaa09a61b6b2fd429416ef596f075b1b71e4596bf0d4b3"}
2022-05-26T15:45:58.604Z INFO subscribers/nodes.go:156 ranking event received before node was added -- try again later {"nodeID": "87bd077d8ec957c69dfaa09a61b6b2fd429416ef596f075b1b71e4596bf0d4b3"}
2022-05-26T15:45:58.604Z ERROR storage storage/nodes.go:147 Received node ranking for non existing node {"node_id": "0476af0effc7fc892b19d869683212c710750495bc061fdc7bd2d36f9e5f5748"}
2022-05-26T15:45:58.604Z INFO subscribers/nodes.go:156 ranking event received before node was added -- try again later {"nodeID": "0476af0effc7fc892b19d869683212c710750495bc061fdc7bd2d36f9e5f5748"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xfd8b2b]
goroutine 96641113 [running]:
code.vegaprotocol.io/data-node/sqlstore.(*Trades).queryTradesWithCursorPagination(0x0, {0x1b9ad50, 0xc0649f4f90}, {0x183d732, 0x27}, {0xc1518dc5f0, 0x1, 0x1}, {0x0, 0x0})
/jenkins/workspace/vegaprotocol_data-node_v0.51.1/sqlstore/trades.go:196 +0x26b
code.vegaprotocol.io/data-node/sqlstore.(*Trades).GetByMarketWithCursor(0x203019?, {0x1b9ad50, 0xc0649f4f90}, {0xc162222b40, 0x40}, {0x0?, 0x0?})
/jenkins/workspace/vegaprotocol_data-node_v0.51.1/sqlstore/trades.go:102 +0x12e
code.vegaprotocol.io/data-node/api.(*tradingDataServiceV2).GetTradesByMarket(0xc000680230, {0x1b9ad50, 0xc0649f4f90}, 0x151c8c0?)
/jenkins/workspace/vegaprotocol_data-node_v0.51.1/api/trading_data_v2.go:505 +0xbd
code.vegaprotocol.io/protos/data-node/api/v2._TradingDataService_GetTradesByMarket_Handler.func1({0x1b9ad50, 0xc0649f4f90}, {0x16c09c0?, 0xc0c58dea00})
/jenkins/GOPATH/pkg/mod/code.vegaprotocol.io/protos@v0.51.1/data-node/api/v2/trading_data_grpc.pb.go:722 +0x7b
code.vegaprotocol.io/data-node/api.remoteAddrInterceptor.func1({0x1b9ad50, 0xc0649f4f30}, {0x16c09c0, 0xc0c58dea00}, 0xc049a56060, 0xc0275e2948)
/jenkins/workspace/vegaprotocol_data-node_v0.51.1/api/server.go:323 +0x637
code.vegaprotocol.io/protos/data-node/api/v2._TradingDataService_GetTradesByMarket_Handler({0x17c3e60?, 0xc000680230}, {0x1b9ad50, 0xc0649f4f30}, 0xc251dc7380, 0xc00006ef30)
/jenkins/GOPATH/pkg/mod/code.vegaprotocol.io/protos@v0.51.1/data-node/api/v2/trading_data_grpc.pb.go:724 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001b0540, {0x1ba2470, 0xc000336d00}, 0xc280fbb440, 0xc000597b00, 0x27dd840, 0x0)
/jenkins/GOPATH/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1282 +0xccf
google.golang.org/grpc.(*Server).handleStream(0xc0001b0540, {0x1ba2470, 0xc000336d00}, 0xc280fbb440, 0x0)
/jenkins/GOPATH/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:1619 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/jenkins/GOPATH/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:921 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/jenkins/GOPATH/pkg/mod/google.golang.org/grpc@v1.45.0/server.go:919 +0x28a
Additional context
Add any other context about the problem here including; system version numbers, components affected.
Definition of Done
ℹ️ Not every issue will need every item checked, however, every item on this list should be properly considered and actioned to meet the DoD.
Before Merging
[ ] Code refactored to meet SOLID and other code design principles
[ ] Code is compilation error, warning, and hint free
[ ] Carry out a basic happy path end-to-end check of the new code
[ ] All APIs are documented so auto-generated documentation is created
[ ] All bug recreation steps can be followed without presenting the original error/bug
[ ] All Unit, Integration and BVT tests are passing
Problem encountered
Panic on data-node n01.stagnet2.vega.xyz on Stagnet 2 network.
Observed behaviour
The Stagnet 2 has been up and running for around 7 days: 5 validators (vega + tendermint) and 5 non-validators (vega + tendermint + data-node). Panic has occurred on one data-node only. All the other services: vega, tendermint, data-node on all servers are up and running (no panic). This node has been queried by https://stats.vega.trading/, and probably by some Frontend services.
Expected behaviour
No panic
System response
data-node went down
Steps to reproduce
Manual
Steps to reproduce the behaviour manually: No steps
Automation
-
Evidence
Logs
Additional context
Add any other context about the problem here including; system version numbers, components affected.
Definition of Done
Before Merging
After Merging
Done
if there is NO requirement for new system-tests