iotaledger / inx-tendercoo

INX-Tendercoo enables a committee of validators to operate as a distributed Coordinator using Tendermint Core BFT consensus.
Apache License 2.0
1 stars 0 forks source link

Crash inx-tendercoo (rpc error/timeout) #42

Closed shufps closed 1 year ago

shufps commented 1 year ago

At 18:50:51 inx-tendercoo crashed on the coo-3.h.alphanet.

It seems that it happened because of RPC error/timeout.

Suspicious logs start to appear at 2022-11-30T17:50:11 (server-time is CET-1) in the log files. coo.log

inx-tendercoo.log

The inx-tendercoo could be restarted without problems. It synced the tendermint blockchain and continued normal operation.

Grafana showed a hornet prometheus "scraping delay" of 10s during the timespan what normally indicates some locking problem inside the node (I think gossip is then affected and the node doesn't make progress syncing).

image

shufps commented 1 year ago

coo-0.h.log

shufps commented 1 year ago

Update:

4 of 10 tendercoos crashed after about 17:05 (CET), not all at the same time

image

Here are logs:

https://files.alphanet.iotaledger.net/logs.2022-12-01.tgz

shufps commented 1 year ago

Update ... coo-0.s2.aws new error messages:

CONSENSUS FAILURE!!!    {"module": "consensus", "err": "+2/3 committed an invalid block: wrong Block.Header.AppHash.  Expected E3CEBED6BE7B4AD1A2D7FE95DA3112D4563C0BAB3D0C448D617F33C80D8A5CB4, got 3E2FEB4203D675F5115053780CCD544571BADE692915C9624D4D7CAEDFFEF033"

inx-tendercoo.coo-0.s2.aws.log

With this error message, the inx-tendercoo doesn't shutdown and doesn't ever recover.

shufps commented 1 year ago

image

shufps commented 1 year ago

After restarting, crashing, restarting, crashing, ... everything seems to recover again.

Tendermint blockchain seems to be fine :man_shrugging:

shufps commented 1 year ago

update:

image

Could be that problems were caused by pruning (a lot :innocent: ). Bottom graph shows time of Consensus Failure occurences, top graph shows pruning progress.

shufps commented 1 year ago

the only one that didn't recover was the tendercoo with the invalid block error message. (coo-0.s2.aws)

When trying to start it:

inx-tendercoo    | 
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    INX     Connecting to node and reading node configuration ...
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    INX     Reading node status ...
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Providing Coordinator ...
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Providing Coordinator ... done
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Providing Tendermint ...
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Found private validator {"keyFile": "tendermint/config/priv_validator_key.json", "stateFile": "tendermint/data/priv_validator_state.json"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Found node key  {"path": "tendermint/config/node_key.json"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Coordinator resumed     {"state": {"MilestoneHeight":124611,"MilestoneIndex":6978045,"LastMilestoneID":"b2840e2e5ed7a0c7d909a1bbd86b3d27a8daae00672692675a79d4c2904c058b","LastMilestoneBlockID":"c0f0f47d30b03188b2d402e58a2f1b9443e1b1845dce639e9ad38d46f2759f59"}}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "proxy", "msg": "Starting multiAppConn service", "impl": "multiAppConn"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "proxy", "module": "abci-client", "connection": "query", "msg": "Starting localClient service", "impl": "localClient"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "proxy", "module": "abci-client", "connection": "snapshot", "msg": "Starting localClient service", "impl": "localClient"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "proxy", "module": "abci-client", "connection": "mempool", "msg": "Starting localClient service", "impl": "localClient"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "proxy", "module": "abci-client", "connection": "consensus", "msg": "Starting localClient service", "impl": "localClient"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "events", "msg": "Starting EventBus service", "impl": "EventBus"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "events", "module": "pubsub", "msg": "Starting PubSub service", "impl": "PubSub"}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      service start   {"module": "txindex", "msg": "Starting IndexerService service", "impl": "IndexerService"}
inx-tendercoo    | 2022-12-01T19:14:07Z DEBUG   Coordinator     ABCI Info       {"req": {"version":"0.34.23","block_version":11,"p2p_version":8}}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      ABCI Handshake App Info {"module": "consensus", "height": 124611, "hash": "EDSI4r/vyDpw6Pj3Lh5znvbQHJ6VB04dgQLi2JpnauI=", "software-version": "", "protocol-version": 1}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Tendermint      ABCI Replay Blocks      {"module": "consensus", "appHeight": 124611, "storeHeight": 124610, "stateHeight": 124610}
inx-tendercoo    | 2022-12-01T19:14:07Z INFO    Coordinator     Providing Tendermint ... done
inx-tendercoo    | 2022-12-01T19:14:07Z PANIC   App     invoke core component (Coordinator) failed: could not build arguments for function "github.com/iotaledger/inx-tendercoo/core/decoo".init.0.func1 (/scratch/core/decoo/component.go:74): failed to build *node.Node: received non-nil error from function "github.com/iotaledger/inx-tendercoo/core/decoo".provide.func2 (/scratch/core/decoo/component.go:157): failed to provide Tendermint: error during handshake: error on replay: app block height (124611) is higher than core (124610)
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).LogPanicf
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:856
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).invoke.func1
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:496
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.forEachCoreComponent
inx-tendercoo    | panic: invoke core component (Coordinator) failed: could not build arguments for function "github.com/iotaledger/inx-tendercoo/core/decoo".init.0.func1 (/scratch/core/decoo/component.go:74): failed to build *node.Node: received non-nil error from function "github.com/iotaledger/inx-tendercoo/core/decoo".provide.func2 (/scratch/core/decoo/component.go:157): failed to provide Tendermint: error during handshake: error on replay: app block height (124611) is higher than core (124610)
inx-tendercoo    | 
inx-tendercoo    | goroutine 1 [running]:
inx-tendercoo    | go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x0?, 0x0?, {0x0?, 0x0?, 0xc0011465a0?})
inx-tendercoo    |      /go/pkg/mod/go.uber.org/zap@v1.23.0/zapcore/entry.go:198 +0x65
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:744
inx-tendercoo    | go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00111edd0, {0x0, 0x0, 0x0})
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).ForEachCoreComponent
inx-tendercoo    |      /go/pkg/mod/go.uber.org/zap@v1.23.0/zapcore/entry.go:264 +0x3ec
inx-tendercoo    | go.uber.org/zap.(*SugaredLogger).log(0xc0001162b8, 0x4, {0x120fecc?, 0x7b00000000000001?}, {0xc000137de0?, 0x1?, 0x0?}, {0x0, 0x0, 0x0})
inx-tendercoo    |      /go/pkg/mod/go.uber.org/zap@v1.23.0/sugar.go:287 +0xee
inx-tendercoo    | go.uber.org/zap.(*SugaredLogger).Panicf(...)
inx-tendercoo    |      /go/pkg/mod/go.uber.org/zap@v1.23.0/sugar.go:181
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).LogPanicf(...)
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:856
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:752
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).invoke
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).invoke.func1(0xc0001160c0)
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:493
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).initializeApp
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:496 +0x11b
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:632
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).Run
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.forEachCoreComponent(...)
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:646
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:744
inx-tendercoo    | main.main
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).ForEachCoreComponent(...)
inx-tendercoo    |      /scratch/main.go:8
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:752
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).invoke(0xc0003bc8f0?)
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:493 +0x142
inx-tendercoo    | runtime.main
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).initializeApp(0xc0004e5ed8?)
inx-tendercoo    |      /usr/local/go/src/runtime/proc.go:250
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:632 +0x5a
inx-tendercoo    | github.com/iotaledger/hive.go/core/app.(*App).Run(0xc0003bc8f0)
inx-tendercoo    |      /go/pkg/mod/github.com/iotaledger/hive.go/core@v1.0.0-rc.1/app/app.go:646 +0x25
inx-tendercoo    | main.main()
inx-tendercoo    |      /scratch/main.go:8 +0x1e
inx-tendercoo exited with code 2
Wollac commented 1 year ago

The issue on coo-0.s2.aws is different.

Seems like Tendermint uses a 2-stage approach:

  1. it first writes the signature to a dedicated state file
  2. it then stores the signed message in the DB and broadcasts it

This approached is supposed to be more robust on DB crashes. And this is what happened here: We used the height from the state file, which is one to high. This is fixed in PR #43

shufps commented 1 year ago

Another inx-tendercoo just crasht on

ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 476308, "hash": "B5BAD5AAC9BB24CDF885236C1EB75C6A019D9ED8F0A727F104D13C4D7C511EB0"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 476308, "hash": "B5BAD5AAC9BB24CDF885236C1EB75C6A019D9ED8F0A727F104D13C4D7C511EB0", "root": "D352299F3DFDD917B11660CCB22DC31F
B299306BF50ACEBD777083D43F68A1D4", "num_txs": 9}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "Kml39uVHUY1XLXumP2tQtbuvPegCPcVCdRcj85Epe4s="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "r+rCxSoK09pTe6h9AXJyD9VfpP2zxEsP0gvIEtGKiFY="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "+dllamAEkIPu9hSHYyGHs1EpTB+iPRGAYNgT220D6LY="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "GVltVw6VNfEcw0CR25jsRKUm44ZX+Ca11rYVLlM7jAY="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "2ZIoGaOelN3zkH9LnI35PznwJiRPy2CSBbmoeQIlmfI="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "ZLx8094lTMOipuCq7luEHQpnGyK0h3iPWU+/HvMpk1w="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "2F5bFZDYmNHgzeuy47UzfIt2JwFCZj14gRaDukfBfJg="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"BlockID":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "jiIq5+Ktz7h6KYShmq1SsZee0UcsPLFyOac+8dNEw1o="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "jiIq5+Ktz7h6KYShmq1SsZee0UcsPLFyOac+8dNEw1o="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     awaiting parent {"blockID": "c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      executed block  {"module": "state", "height": 476308, "num_valid_txs": 9, "num_invalid_txs": 0}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      committed state {"module": "state", "height": 476308, "num_txs": 9, "app_hash": "55A9F5C02F30997EFF30FA1193CCA132F74E910F346E250A17728628B3F5364A"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 476308}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      commit is for a block we do not know about; set ProposalBlock=nil       {"module": "consensus", "height": 476309, "commit_round": 0, "proposal": "", "commit": "3570885A8D587953E5C1B0F89AE3708
D51F169B08355FECF9F32A1B22781AF6A"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 476309, "hash": "3570885A8D587953E5C1B0F89AE3708D51F169B08355FECF9F32A1B22781AF6A"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 476309, "hash": "3570885A8D587953E5C1B0F89AE3708D51F169B08355FECF9F32A1B22781AF6A", "root": "55A9F5C02F30997EFF30FA1193CCA132
F74E910F346E250A17728628B3F5364A", "num_txs": 7}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "+dllamAEkIPu9hSHYyGHs1EpTB+iPRGAYNgT220D6LY="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "PfgJZMyDkh4IwfoKT1/AWBCmNNpFRhsrMV/L/WL3yrc="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "GVltVw6VNfEcw0CR25jsRKUm44ZX+Ca11rYVLlM7jAY="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "ZLx8094lTMOipuCq7luEHQpnGyK0h3iPWU+/HvMpk1w="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "2F5bFZDYmNHgzeuy47UzfIt2JwFCZj14gRaDukfBfJg="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "Kml39uVHUY1XLXumP2tQtbuvPegCPcVCdRcj85Epe4s="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7045956,"Parent":"c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"}, "issuer": "2ZIoGaOelN3zkH9LnI35PznwJiRPy2CSBbmoeQIlmfI="}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:06Z    DEBUG   Coordinator     create milestone        {"state": {"MilestoneHeight":476306,"MilestoneIndex":7045956,"LastMilestoneID":"9b1a494f7a9d50661be693c9e17100d8c306334a6bebdd77781241582aaa86c2","LastMilestoneBlockID
":"b551cd5fb3d1bf1ee61a48b43d86ef2b693b2d80f15875b9f4ac24464c02cc9d"}, "parents": ["b551cd5fb3d1bf1ee61a48b43d86ef2b693b2d80f15875b9f4ac24464c02cc9d","c69d0a7742c9babf9904a1b1d4254ddad998533ecb60e4b9cd5daa4d35ad33c0"]}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    WARN    Coordinator     stopping after panic
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"msg": "Stopping Node service", "impl": "Node"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      Stopping Node
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "events", "msg": "Stopping EventBus service", "impl": "EventBus"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "events", "module": "pubsub", "msg": "Stopping PubSub service", "impl": "PubSub"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "txindex", "msg": "Stopping IndexerService service", "impl": "IndexerService"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "p2p", "msg": "Stopping P2P Switch service", "impl": "P2P Switch"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "p2p", "peer": "0223eb172703e02c2b73d9710f535b8c01480235@65.21.1.131:42898", "msg": "Stopping MConnection service", "impl": "MConn{65.21.1.131:42898}"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    ERROR   Tendermint      Stopping peer for error {"module": "p2p", "peer": "Peer{MConn{65.21.1.131:42898} 0223eb172703e02c2b73d9710f535b8c01480235 in}", "err": "read tcp 172.19.0.3:26656->65.21.1.131:42898: use of cl
osed network connection"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "p2p", "peer": "0223eb172703e02c2b73d9710f535b8c01480235@65.21.1.131:42898", "msg": "Stopping Peer service", "impl": "Peer{MConn{65.21.1.131:42898} 0223eb172703e02c
2b73d9710f535b8c01480235 in}"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    ERROR   Tendermint      error while stopping peer       {"module": "p2p", "error": "already stopped"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "p2p", "peer": "008cb3e446c123cd0be3d1c8bbdfa555cfb600e7@167.235.137.109:44686", "msg": "Stopping MConnection service", "impl": "MConn{167.235.137.109:44686}"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    ERROR   Tendermint      Stopping peer for error {"module": "p2p", "peer": "Peer{MConn{167.235.137.109:44686} 008cb3e446c123cd0be3d1c8bbdfa555cfb600e7 in}", "err": "read tcp 172.19.0.3:26656->167.235.137.109:44686: u
se of closed network connection"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    INFO    Tendermint      service stop    {"module": "p2p", "peer": "008cb3e446c123cd0be3d1c8bbdfa555cfb600e7@167.235.137.109:44686", "msg": "Stopping Peer service", "impl": "Peer{MConn{167.235.137.109:44686} 008cb3e4
46c123cd0be3d1c8bbdfa555cfb600e7 in}"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    ERROR   Tendermint      error while stopping peer       {"module": "p2p", "error": "already stopped"}
ESC[36minx-tendercoo    |ESC[0m 2022-12-06T13:34:08Z    ERROR   Tendermint      CONSENSUS FAILURE!!!    {"module": "consensus", "err": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "stack": "goroutine 277 [running]:\nruntime/debug.Stack()\n\t/usr
/local/go/src/runtime/debug/stack.go:24 +0x65\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2()\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:724 +0x4c\npanic({0x1085080, 0xc004a9a1d8})\n\t/usr/local/go/src/runtime/
panic.go:884 +0x212\ngithub.com/iotaledger/inx-tendercoo/pkg/decoo.(*Coordinator).stopOnPanic(0xc00037c000)\n\t/scratch/pkg/decoo/app.go:262 +0xb9\npanic({0x1085080, 0xc004a9a1d8})\n\t/usr/local/go/src/runtime/panic.go:884 +0x212\ngithub.com/iotaledger/inx-tendercoo/pkg/
decoo.(*Coordinator).createMilestoneEssence(0xc00037c000, {0xc001d53bc0, 0x2, 0x2})\n\t/scratch/pkg/decoo/app.go:342 +0x65c\ngithub.com/iotaledger/inx-tendercoo/pkg/decoo.(*Coordinator).EndBlock(0xc00037c000, {0x0})\n\t/scratch/pkg/decoo/app.go:144 +0x3e8\ngithub.com/ten
dermint/tendermint/abci/client.(*localClient).EndBlockSync(0xc0002d91a0, {0xc0002d91a0?})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/abci/client/local_client.go:288 +0xdf\ngithub.com/tendermint/tendermint/proxy.(*appConnConsensus).EndBlockSync(0xc0052df1c0?
, {0xc0038a4a20?})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/proxy/app_conn.go:89 +0x24\ngithub.com/tendermint/tendermint/state.execBlockOnProxyApp({0x14cc530?, 0xc00051f440}, {0x14d1940, 0xc00051fa60}, 0xc0004f0000, {0x14d55f0, 0xc0002488a0}, 0x74494?)\n\
t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/state/execution.go:327 +0x55f\ngithub.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x1}, {0xc0004ba6e0, 0x7}}, {0xc0004ba6f0, 0xb}, 0x1, 0x74494, {{0xc005056820, ...}, ...}, ...}, ...)\n\t/g
o/pkg/mod/github.com/tendermint/tendermint@v0.34.23/state/execution.go:140 +0x171\ngithub.com/tendermint/tendermint/consensus.(*State).finalizeCommit(0xc000443c00, 0x74495)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:1654 +0xafd\ngithub.com/tendermint/tendermint/consensus.(*State).tryFinalizeCommit(0xc000443c00, 0x74495)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:1563 +0x2ff\ngithub.com/tendermint/tendermint/consensus.(*State).handleCompleteProposal(0xc000443c00, 0xa2b?)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:1942 +0x399\ngithub.com/tendermint/tendermint/consensus.(*State).handleMsg(0xc000443c00, {{0x14bea20?, 0xc0037e0df8?}, {0xc00292af60?, 0x0?}})\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:834 +0x290\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine(0xc000443c00, 0x0)\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:760 +0x3f9\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\t/go/pkg/mod/github.com/tendermint/tendermint@v0.34.23/consensus/state.go:379 +0x12d\n"}

The hornet itself seems to be not synced and/or having connection issues:

2022-12-06T13:33:19Z    INFO    Tangle  Milestone confirmed (7045954): txsReferenced: 10, txsValue: 0, txsZeroValue: 10, txsConflicting: 0, collect: 0s, total: 2ms
2022-12-06T13:33:19Z    INFO    Tangle  New confirmed milestone: 7045954, 1.00 BPS, 1.25 RBPS, 125.00% ref.rate
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00005/00001/00003, Tips (non-/semi-lazy): 6/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00010/00002/00012, Tips (non-/semi-lazy): 6/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00005/00001/00003, Tips (non-/semi-lazy): 5/0
2022-12-06T13:33:23Z    INFO    Tangle  Solidifier finished: blocks: 0, collect: 0s, solidity 0s, propagation: 0s, total: 0s
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 5/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
2022-12-06T13:33:37Z    INFO    P2P-Manager     scheduled reconnect in 30.427836115s to <peer.ID 12*2ve2DM>
2022-12-06T13:33:37Z    INFO    P2P-Manager     disconnected <peer.ID 12*2ve2DM> connection closed by libp2p network event
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
2022-12-06T13:34:00Z    WARN    Gossip  closing connection to peer <peer.ID 12*7fYS88> because of a protocol error: stream reset
2022-12-06T13:34:00Z    INFO    P2P-Manager     scheduled reconnect in 30.575030776s to <peer.ID 12*7fYS88>
2022-12-06T13:34:00Z    INFO    P2P-Manager     disconnected <peer.ID 12*7fYS88> connection closed by libp2p network event
2022-12-06T13:34:00Z    INFO    GossipService   terminated protocol with <peer.ID 12*7fYS88>
2022-12-06T13:34:00Z    INFO    P2P-Manager     scheduled reconnect in 30.044853625s to <peer.ID 12*RHm7yk>
2022-12-06T13:34:00Z    INFO    P2P-Manager     disconnected <peer.ID 12*RHm7yk> connection closed by libp2p network event
2022-12-06T13:34:00Z    WARN    Gossip  closing connection to peer <peer.ID 12*RHm7yk> because of a protocol error: stream reset
2022-12-06T13:34:00Z    INFO    GossipService   terminated protocol with <peer.ID 12*RHm7yk>
2022-12-06T13:34:00Z    INFO    P2P-Manager     scheduled reconnect in 30.696113023s to <peer.ID 12*PTKNqN>
2022-12-06T13:34:00Z    INFO    P2P-Manager     disconnected <peer.ID 12*PTKNqN> connection closed by libp2p network event
2022-12-06T13:34:00Z    INFO    GossipService   terminated protocol with <peer.ID 12*PTKNqN>
2022-12-06T13:34:00Z    INFO    P2P-Manager     scheduled reconnect in 30.594113662s to <peer.ID 12*K4exri>
2022-12-06T13:34:00Z    INFO    P2P-Manager     disconnected <peer.ID 12*K4exri> connection closed by libp2p network event
2022-12-06T13:34:00Z    INFO    GossipService   terminated protocol with <peer.ID 12*K4exri>
2022-12-06T13:34:00Z    WARN    Gossip  closing connection to peer <peer.ID 12*K4exri> because of a protocol error: stream reset
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045954/7045954, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 1/0
2022-12-06T13:34:05Z    INFO    Tangle  Valid milestone detected! Index: 7045955
2022-12-06T13:34:05Z    INFO    Tangle  Run solidity check for Milestone (7045955) ...
2022-12-06T13:34:05Z    INFO    Tangle  Solidifier finished: blocks: 0, collect: 0s, solidity 0s, propagation: 0s, total: 0s
2022-12-06T13:34:05Z    INFO    Tangle  Milestone confirmed (7045955): txsReferenced: 7, txsValue: 0, txsZeroValue: 7, txsConflicting: 0, collect: 4ms, total: 6ms
2022-12-06T13:34:05Z    INFO    Tangle  New confirmed milestone: 7045955, 0.43 BPS, 1.00 RBPS, 233.33% ref.rate
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045955, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 2/0
2022-12-06T13:34:06Z    WARN    Tangle  Stopped solidifier due to missing block -> Requested missing blocks (1/1), collect: 0s
req(qu/pe/proc/lat): 00000/00001/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045955, BPS (in/new/out): 00000/00001/00000, Tips (non-/semi-lazy): 2/0
req(qu/pe/proc/lat): 00000/00001/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045955, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 2/0
2022-12-06T13:34:08Z    INFO    P2P-Manager     reconnecting <peer.ID 12*2ve2DM>
2022-12-06T13:34:08Z    INFO    P2P-Manager     connected <peer.ID 12*2ve2DM> (Outbound)
2022-12-06T13:34:09Z    INFO    GossipService   started protocol with <peer.ID 12*2ve2DM>
2022-12-06T13:34:09Z    INFO    WarpSync        Synchronizing to milestone 7045960
2022-12-06T13:34:09Z    INFO    Tangle  Valid milestone detected! Index: 7045960
2022-12-06T13:34:09Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:09Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:09Z    INFO    Tangle  Valid milestone detected! Index: 7045959
2022-12-06T13:34:09Z    INFO    Tangle  Valid milestone detected! Index: 7045958
2022-12-06T13:34:09Z    INFO    Tangle  Valid milestone detected! Index: 7045956
2022-12-06T13:34:09Z    INFO    Tangle  Run solidity check for Milestone (7045956) ...
2022-12-06T13:34:09Z    WARN    Tangle  Stopped solidifier due to missing block -> Requested missing blocks (0/1), collect: 0s
2022-12-06T13:34:09Z    INFO    Tangle  Milestone couldn't be solidified! 7045956
2022-12-06T13:34:09Z    INFO    Tangle  Valid milestone detected! Index: 7045957
req(qu/pe/proc/lat): 00000/00007/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045960, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 2/0
req(qu/pe/proc/lat): 00000/00001/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045960, BPS (in/new/out): 00006/00005/00001, Tips (non-/semi-lazy): 2/0
req(qu/pe/proc/lat): 00000/00001/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045955/7045960, BPS (in/new/out): 00049/00047/00000, Tips (non-/semi-lazy): 2/0
2022-12-06T13:34:11Z    INFO    Tangle  Run solidity check for Milestone (7045956) ...
2022-12-06T13:34:11Z    INFO    Tangle  Solidifier finished: blocks: 4, collect: 0s, solidity 0s, propagation: 0s, total: 0s
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    ERROR   INX     send error: rpc error: code = Unavailable desc = transport is closing
2022-12-06T13:34:11Z    INFO    Tangle  Milestone confirmed (7045956): txsReferenced: 8, txsValue: 0, txsZeroValue: 8, txsConflicting: 0, collect: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    Tangle  New confirmed milestone: 7045956, 1.33 RBPS
2022-12-06T13:34:11Z    INFO    Tangle  Run solidity check for Milestone (7045957) ...
2022-12-06T13:34:11Z    INFO    Tangle  Solidifier finished: blocks: 6, collect: 0s, solidity 0s, propagation: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    Tangle  Milestone confirmed (7045957): txsReferenced: 10, txsValue: 0, txsZeroValue: 10, txsConflicting: 0, collect: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    Tangle  New confirmed milestone: 7045957, 2.00 RBPS
2022-12-06T13:34:11Z    INFO    Tangle  Run solidity check for Milestone (7045958) ...
2022-12-06T13:34:11Z    INFO    Tangle  Solidifier finished: blocks: 6, collect: 0s, solidity 0s, propagation: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    Tangle  Milestone confirmed (7045958): txsReferenced: 9, txsValue: 0, txsZeroValue: 9, txsConflicting: 0, collect: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    Tangle  New confirmed milestone: 7045958, 0.90 RBPS
2022-12-06T13:34:11Z    INFO    Tangle  Run solidity check for Milestone (7045959) ...
2022-12-06T13:34:11Z    INFO    Tangle  Solidifier finished: blocks: 6, collect: 0s, solidity 0s, propagation: 2ms, total: 3ms
2022-12-06T13:34:11Z    INFO    Tangle  Milestone confirmed (7045959): txsReferenced: 9, txsValue: 0, txsZeroValue: 9, txsConflicting: 0, collect: 0s, total: 2ms
2022-12-06T13:34:11Z    INFO    Tangle  New confirmed milestone: 7045959, 1.50 RBPS
2022-12-06T13:34:11Z    INFO    Tangle  Run solidity check for Milestone (7045960) ...
2022-12-06T13:34:11Z    INFO    Tangle  Solidifier finished: blocks: 0, collect: 0s, solidity 0s, propagation: 0s, total: 0s
2022-12-06T13:34:11Z    INFO    WarpSync        Synchronized 5 milestones in 2.249s (240684.37 BPS)
2022-12-06T13:34:11Z    INFO    Tangle  Milestone confirmed (7045960): txsReferenced: 20, txsValue: 0, txsZeroValue: 20, txsConflicting: 0, collect: 0s, total: 2ms
2022-12-06T13:34:11Z    INFO    Tangle  New confirmed milestone: 7045960, 1.67 RBPS
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045960/7045960, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 28/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045960/7045960, BPS (in/new/out): 00001/00001/00002, Tips (non-/semi-lazy): 28/0
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 7045960/7045960, BPS (in/new/out): 00000/00000/00000, Tips (non-/semi-lazy): 28/0
Wollac commented 1 year ago

On the DeCoo side everything is as expected and it is always the same issue: The INX call ComputeWhiteFlag fails to return in the given time limit of two seconds. Without the white flag information the plugin cannot construct the desired milestone and as the milestone hash is part of the app state which is included in the Tendermint block, there is no way for that validator to progress the chain and it must "panic" and stop.

We should really take a closer look at why that node call seems to be so unreliable with a supposedly excellent network connection as well as on good hardware, and think about hardening it somehow. Since at least 2/3 of the validators have seen the parents as solid and even broadcast that particular milestone already, there might be some kind of bug for it to fail so often.

On the Tendercoo side I can only think of two things:

  1. increase the INX timeout to e.g. 5s
  2. manually wait (long) for the parents/MS to be solid before calling ComputeWhiteFlag (should have the exact same effect, just more complicated to implement)

All of which are just treating the symptoms not the cause!

shufps commented 1 year ago

To figure out the cause, I will build some detection that triggers pprof generation when symptomes (multiple lines of CMI/LMI with the same milestone index). Perhaps it shows us some locks or so :man_shrugging:

muXxer commented 1 year ago

Nice idea!

You could also trigger on BPS (in/new/out): 00000/00000/00000

muXxer commented 1 year ago

After you successfully triggered the pprof the next time, we will increase the whiteflag timeouts in tendercoo and hornet. And you should enable docker restart as well (after you triggered the bug next time).

shufps commented 1 year ago

https://github.com/iotaledger/inx-tendercoo/pull/48 seems to fix it

closed