vegaprotocol / data-node

A rich API server for Vega Protocol
https://vega.xyz
Other
3 stars 1 forks source link

Data node reports lots of "inserting node to SQL store failed" errors #783

Closed jgsbennett closed 2 years ago

jgsbennett commented 2 years ago

Problem encountered

If you check the data node logs for the mainnet checkpoint pipeline, after the checkpoint is restored, you can see a bunch of errors like this:

[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375801193Z 2022-07-08T10:42:02.375Z  ERROR   broker/sqlstore_broker.go:290   failed to process event id:"0-235" block:"1A727D7263691086370929E2A58B93FA3EC0888D4A8116DF722ADACCC051B922" type:BUS_EVENT_TYPE_VALIDATOR_UPDATE validator_update:{node_id:"91fd6bd37f12a89a8ced1bf3681f3f9e6d05930c964458042828a6d8d610a1cf" vega_pub_key:"a6e6f7daf8610f9242ab6ab46b394f6fb79cf9533d48051ca7a2f142b8b700a8" ethereum_address:"0x14174f3c9443EdC12685578FE4d165be5f57fBd3" tm_pub_key:"0ShTSZ9Ss8AFHuDz1nIGMrGucjXhFdJyUTT7Eqibjq8=" info_url:"https://www.greenfield.one" country:"Germany" name:"Greenfield One" avatar_url:"https://www.greenfield.one/avatar.png" vega_pub_key_index:1 added:true} version:1 chain_id:"vega-mainnet-0004" error:ERROR: duplicate key value violates unique constraint "nodes_announced_pkey" (SQLSTATE 23505)
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375858853Z inserting node to SQL store failed
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375864198Z code.vegaprotocol.io/data-node/sqlsubscribers.(*Node).consumeUpdate
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375868182Z   /jenkins/workspace/data-node_develop/sqlsubscribers/node.go:90
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375871874Z code.vegaprotocol.io/data-node/sqlsubscribers.(*Node).Push
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375875591Z   /jenkins/workspace/data-node_develop/sqlsubscribers/node.go:69
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375879145Z code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).push
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375882725Z   /jenkins/workspace/data-node_develop/broker/sqlstore_broker.go:285
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375886300Z code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).handleEvent
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375889926Z   /jenkins/workspace/data-node_develop/broker/sqlstore_broker.go:273
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375893517Z code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).processBlock
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375897245Z   /jenkins/workspace/data-node_develop/broker/sqlstore_broker.go:230
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375900826Z code.vegaprotocol.io/data-node/broker.(*sqlStoreBroker).Receive
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375904348Z   /jenkins/workspace/data-node_develop/broker/sqlstore_broker.go:105
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375907859Z code.vegaprotocol.io/data-node/cmd/data-node/node.(*NodeCommand).runNode.func5
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375911505Z   /jenkins/workspace/data-node_develop/cmd/data-node/node/node.go:224
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375915088Z golang.org/x/sync/errgroup.(*Group).Go.func1
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375918757Z   /jenkins/GOPATH/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375922478Z runtime.goexit
[2022-07-08T10:42:49.386Z] 2022-07-08T10:42:02.375926345Z   /usr/local/go/src/runtime/asm_amd64.s:1571

Observed behaviour

Errors relating to inserting nodes into the database

Expected behaviour

No errors. All data from checkpoint is present and correct in data node.

Steps to reproduce

Rerun: https://jenkins.ops.vega.xyz/blue/organizations/jenkins/private%2FMainnet-Checkpoint-Test-Run/detail/Mainnet-Checkpoint-Test-Run/190/pipeline/439/

Additional context

Note also https://github.com/vegaprotocol/data-node/issues/782 which is visible in same pipeline.

Definition of Done

ℹ️ Not every issue will need every item checked, however, every item on this list should be properly considered and actioned to meet the DoD.

Before Merging

After Merging

wwestgarth commented 2 years ago

I think this will fix it, I spotted this the other day too in the lnl-checkpoint tests: https://github.com/vegaprotocol/data-node/pull/766 (although I'm on the fence about whether core should just not send out two updates, and hold off sending the genesis ones until we know what the checkpoint is saying, but maybe both fixes make sense)

pscott31 commented 2 years ago

ah super I'll take a look at that PR now will

gordsport commented 2 years ago

Hey @pscott31 Did the PR linked above fix this? If so can we close - if not I will prioritise.

pscott31 commented 2 years ago

Yep yep, closing