vegaprotocol / vega

A Go implementation of the Vega Protocol, a protocol for creating and trading derivatives on a fully decentralised network.
https://vega.xyz
GNU Affero General Public License v3.0
37 stars 22 forks source link

Asset deposits give Tendermint panic after core buffer full on core #4835

Closed vega-paul closed 2 years ago

vega-paul commented 2 years ago

When submitting multiple deposit asset trx to core via wallet we eventually see buffer full msg in the core logs

2022-02-21T14:36:16.452Z ERROR commander nodewallets/commander.go:95 could not send transaction to tendermint {"error": "RPC error -32603 - Internal error: can't queue req: buffer is full", "tx": "nonce:7891233871210223698 builtin:{deposit:{vega_asset_id:\"bf22ce3b96af0913b4610c1f8c06f5d244cb5c725d8ca040684863bb16b68104\" party_id:\"36ea75a9310f1490b28875ba2dc9515a9adbae1a18fb06b30c656c2de9c638ba\" amount:\"None\"}}"}

If we persist with the deposits we then see:-

2022-02-21T14:36:29.291Z ERROR commander nodewallets/commander.go:95 could not send transaction to tendermint {"error": "Post \"http://st-local-tendermint-node0:26657\": context deadline exceeded", "tx": "nonce:1988528496525177310 builtin:{deposit:{vega_asset_id:\"bf22ce3b96af0913b4610c1f8c06f5d244cb5c725d8ca040684863bb16b68104\" party_id:\"96a29284fae4d41c74ffc6af61091e9072525cbc1a6a6c85800b14dfc5ccceb0\" amount:\"None\"}}"}

Eventually the core loses connection with TM

2022-02-21T14:36:49.939Z ERROR forwarder evtforward/forwarder.go:280 could not send command {"tx-id": "", "error": "Post \"http://st-local-tendermint-node0:26657\": context deadline exceeded"} 2022-02-21T14:36:54.805Z INFO monitoring monitoring/status.go:222 Chain is still disconnected, shutting down now {"retries-count": 5}

The Tendermint logs show a wrong Block.Header.NextValidatorsHash error:-

2022-02-21T14:36:48Z INFO received complete proposal block hash=7C705976C69D62B66DE21BDD95A59A11212A092A2BABDC87B96C3CB522BE9130 height=12141 module=consensus 2022-02-21T14:36:48Z ERROR prevote step: ProposalBlock is invalid err="wrong Block.Header.NextValidatorsHash. Expected C2BBC7DBACBA7632FCBBA3EE7776EC2FBF0B8E1DB071DBF7BC1B0A7CFDF11BBE, got 1022F84CF04F77ECCD4F0C653E745E354EFE0F31E95DB5C70CD4ED6BF6BBBB83" height=12141 module=consensus round=1 2022-02-21T14:36:50Z INFO Timed out dur=1500 height=12141 module=consensus round=1 step=5 2022-02-21T14:36:51Z INFO Timed out dur=1500 height=12141 module=consensus round=1 step=7 2022-02-21T14:36:51Z INFO received proposal module=consensus proposal={"Type":32,"block_id":{"hash":"23F2846544A63A61864F4686D4C9EADA25C555ABD1BD2D40C679D32A83855635","parts":{"hash":"3F3BDDCEACEAAF796FC6BE201ED788D6A37FA31654B663B5F66B73309267E313","total":1}},"height":12141,"pol_round":-1,"round":2,"signature":"qOlmJGt+AV71MSMoKJvRvgnMXWDn1yElLdqrnvceIITuzAc6eLLw1ms/ySM8KZnumdmCOi83+DwNT/vRj/lwCQ==","timestamp":"2022-02-21T14:36:51.869238697Z"} 2022-02-21T14:36:51Z INFO received complete proposal block hash=23F2846544A63A61864F4686D4C9EADA25C555ABD1BD2D40C679D32A83855635 height=12141 module=consensus 2022-02-21T14:36:54Z INFO Timed out dur=2000 height=12141 module=consensus round=2 step=5 2022-02-21T14:36:54Z INFO Stopping abci.socketClient connection=query module=abci-client reason="read message: EOF" 2022-02-21T14:36:54Z INFO stopping service connection=query impl=socketClient module=abci-client service=socketClient 2022-02-21T14:36:54Z ERROR query connection terminated. Did the application crash? Please restart tendermint err="read message: EOF" module=proxy 2022-02-21T14:36:54Z INFO Stopping abci.socketClient connection=snapshot module=abci-client reason="read message: EOF" 2022-02-21T14:36:54Z INFO stopping service connection=snapshot impl=socketClient module=abci-client service=socketClient 2022-02-21T14:36:54Z INFO captured terminated, exiting... module=main 2022-02-21T14:36:54Z INFO Stopping abci.socketClient connection=consensus module=abci-client reason="read message: EOF" 2022-02-21T14:36:54Z INFO stopping service connection=consensus impl=socketClient module=abci-client service=socketClient 2022-02-21T14:36:54Z INFO Stopping abci.socketClient connection=mempool module=abci-client reason="read message: EOF" 2022-02-21T14:36:54Z INFO stopping service connection=mempool impl=socketClient module=abci-client service=socketClient 2022-02-21T14:36:54Z INFO stopping service impl=Node module=main service=Node 2022-02-21T14:36:54Z INFO Stopping Node module=main 2022-02-21T14:36:54Z INFO stopping service impl=EventBus module=events service=EventBus 2022-02-21T14:36:54Z INFO stopping service impl=PubSub module=pubsub service=PubSub 2022-02-21T14:36:54Z INFO stopping service impl=IndexerService module=txindex service=IndexerService 2022-02-21T14:36:54Z INFO stopping service impl=BlockSync module=blockchain service=BlockSync 2022-02-21T14:36:54Z INFO stopping service impl=ConsensusReactor module=consensus service=Consensus 2022-02-21T14:36:54Z INFO stopping service impl=ConsensusState module=consensus service=State 2022-02-21T14:36:54Z INFO stopping service impl=TimeoutTicker module=consensus service=TimeoutTicker 2022-02-21T14:36:54Z INFO stopping service impl=baseWAL module=consensus service=baseWAL wal=/tendermint/data/cs.wal/wal 2022-02-21T14:36:54Z INFO stopping service impl=Group module=consensus service=Group wal=/tendermint/data/cs.wal/wal 2022-02-21T14:36:55Z INFO stopping service impl=StateSync module=statesync service=StateSync 2022-02-21T14:36:55Z INFO stopping service impl=Mempool module=mempool service=Mempool version=v1 2022-02-21T14:36:55Z INFO stopping service impl=Evidence module=evidence service=Evidence panic: runtime error: invalid memory address or nil pointer dereference

vega-paul commented 2 years ago

tm.log node.log

gordsport commented 2 years ago

@vega-paul - can you please retest now we have rolled back to tendermint 0.34 and close if issue is no longer present

jeremyletang commented 2 years ago

This is not a core issue, the tendermint node could not accept anymore transaction for some reason. The vega node log shows that it was not able to send transaction / connect to tendermint so it just decided to shut down for this reason.

Not a core issue.