iotaledger / inx-tendercoo

INX-Tendercoo enables a committee of validators to operate as a distributed Coordinator using Tendermint Core BFT consensus.
Apache License 2.0
1 stars 0 forks source link

No crash but (almost) nothing happens anymore #45

Closed shufps closed 1 year ago

shufps commented 1 year ago

Alphanet fell out of sync and is not generating new milestones since today about 1:45AM

The weird thing is that neither the inx-tendercoos nor the hornet nodes crashed.

Logs of hornets look normal - they just didn't get milestones anymore.

Logs of inx-tendercoo look weird.

Example:

2022-12-25T00:27:07Z    INFO    Coordinator     new confirmed milestone: 7296250
2022-12-25T00:27:07Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.6839246, "height": 1758286, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:07Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758286/0 (85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF:1:76D892699DE1, -1) F8E00F9AE0DB @ 2022-12-25T00:27:07.85879395Z}"}
2022-12-25T00:27:07Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758286, "hash": "85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF"}
2022-12-25T00:27:08Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758286, "hash": "85CD737A2CE12F901523B8CFF39727F8426F0C187855078C7922440454AE9ECF", "root": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB", "num_txs": 0}
2022-12-25T00:27:08Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758286, "num_valid_txs": 0, "num_invalid_txs": 0}
2022-12-25T00:27:08Z    INFO    Tendermint      committed state {"module": "state", "height": 1758286, "num_txs": 0, "app_hash": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB"}
2022-12-25T00:27:08Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758286}
2022-12-25T00:27:09Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T00:27:09Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}}
2022-12-25T00:27:09Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.016293711, "height": 1758287, "round": 0, "step": "RoundStepNewRound"}
2022-12-25T00:27:09Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758287/0 (FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1:1:F156BA0EBB9F, -1) 7D8FD69C4E54 @ 2022-12-25T00:27:09.032542002Z}"}
2022-12-25T00:27:09Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758287, "hash": "FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1"}
2022-12-25T00:27:09Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758287, "hash": "FD36C18FAE0D8B7E333283667ED3B81C1923F3225DBB7923DFA4615F35D764A1", "root": "E34EAF0B3A6F03E2B59EA5F6A9B3923335AFDA594D95F7A9B318302A33DA9DCB", "num_txs": 1}
2022-12-25T00:27:09Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "PfgJZMyDkh4IwfoKT1/AWBCmNNpFRhsrMV/L/WL3yrc="}
2022-12-25T00:27:09Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:09Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758287, "num_valid_txs": 1, "num_invalid_txs": 0}
2022-12-25T00:27:09Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:09Z    INFO    Tendermint      committed state {"module": "state", "height": 1758287, "num_txs": 1, "app_hash": "358432A9ABC070DC8928FDD7A6A35E463E5E9E87015F3CC121BDFDA13C5F860E"}
2022-12-25T00:27:09Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758287}
2022-12-25T00:27:10Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.682942085, "height": 1758288, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:10Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758288/0 (8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9:1:4132F2A0EED9, -1) F1F21243BCBE @ 2022-12-25T00:27:10.299955396Z}"}
2022-12-25T00:27:10Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758288, "hash": "8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9"}
2022-12-25T00:27:10Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758288, "hash": "8C7801FEC24D29F328E9592AFE9B21790E39F5A287A43380A1B049BD1F9B0EB9", "root": "358432A9ABC070DC8928FDD7A6A35E463E5E9E87015F3CC121BDFDA13C5F860E", "num_txs": 8}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "Kml39uVHUY1XLXumP2tQtbuvPegCPcVCdRcj85Epe4s="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "r+rCxSoK09pTe6h9AXJyD9VfpP2zxEsP0gvIEtGKiFY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "o74D/yOQIQNUqa9r60GKlXdmsFD1p8YmYoA8wmepRPY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "jiIq5+Ktz7h6KYShmq1SsZee0UcsPLFyOac+8dNEw1o="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "2ZIoGaOelN3zkH9LnI35PznwJiRPy2CSBbmoeQIlmfI="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "GVltVw6VNfEcw0CR25jsRKUm44ZX+Ca11rYVLlM7jAY="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "ZLx8094lTMOipuCq7luEHQpnGyK0h3iPWU+/HvMpk1w="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     DeliverTx       {"tx": {"Index":7296251,"BlockID":"46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}, "issuer": "2F5bFZDYmNHgzeuy47UzfIt2JwFCZj14gRaDukfBfJg="}
2022-12-25T00:27:10Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:10Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758288, "num_valid_txs": 8, "num_invalid_txs": 0}
2022-12-25T00:27:10Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:10Z    INFO    Tendermint      committed state {"module": "state", "height": 1758288, "num_txs": 8, "app_hash": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599"}
2022-12-25T00:27:10Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758288}
2022-12-25T00:27:11Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.681604159, "height": 1758289, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T00:27:14Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 3, "height": 1758289, "round": 0, "step": "RoundStepPropose"}
2022-12-25T00:27:15Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 1, "height": 1758289, "round": 0, "step": "RoundStepPrecommitWait"}
2022-12-25T00:27:16Z    INFO    Tendermint      received proposal       {"module": "consensus", "proposal": "Proposal{1758289/1 (522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C:1:96945E28DF84, -1) A4148DCF911D @ 2022-12-25T00:27:15.931789376Z}"}
2022-12-25T00:27:16Z    INFO    Tendermint      received complete proposal block        {"module": "consensus", "height": 1758289, "hash": "522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C"}
2022-12-25T00:27:16Z    INFO    Tendermint      finalizing commit of block      {"module": "consensus", "height": 1758289, "hash": "522D4C8CE0F5BA59F0E7B36C105A0666C9D5F535DE3E396D00737954BC9D243C", "root": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599", "num_txs": 0}
2022-12-25T00:27:16Z    DEBUG   Coordinator     awaiting parent {"blockID": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:16Z    INFO    Tendermint      executed block  {"module": "state", "height": 1758289, "num_valid_txs": 0, "num_invalid_txs": 0}
2022-12-25T00:27:16Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
2022-12-25T00:27:16Z    INFO    Tendermint      committed state {"module": "state", "height": 1758289, "num_txs": 0, "app_hash": "4451565D214CE75A4C65FB94BBF427F40902DA92D6C2322228F350FC3E5B2599"}
2022-12-25T00:27:16Z    INFO    Tendermint      indexed block exents    {"module": "txindex", "height": 1758289}
2022-12-25T00:27:17Z    INFO    Tendermint      Timed out       {"module": "consensus", "dur": 0.681646438, "height": 1758290, "round": 0, "step": "RoundStepNewHeight"}
2022-12-25T03:01:53Z    INFO    Coordinator     trigger next milestone preemptively
2022-12-25T03:01:53Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T03:01:53Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"d5f49c5365db761c153654b08816b34d625a4547722af125990030343d991eb4"}}
2022-12-25T03:01:53Z    WARN    Coordinator     failed to propose parent: invalid proposal: already applied
2022-12-25T05:43:19Z    INFO    Coordinator     trigger next milestone preemptively
2022-12-25T05:43:19Z    INFO    Coordinator     proposing parent for milestone 7296251
2022-12-25T05:43:19Z    DEBUG   Coordinator     broadcast tx    {"parent": {"Index":7296251,"BlockID":"d9cd72873d7b4e5f0b6bf480d3d6203b6775667621b0d1a0e9098c421feb219f"}}
2022-12-25T05:43:19Z    WARN    Coordinator     failed to propose parent: invalid proposal: already applied

Important to notice are the timestamps of the logging. There are 2,5h+ where nothing at all happened.

Then the message trigger next milestone preemptively appears and inx-tendercoo is doing a little bit.

There were no errors but there is one DEBUG message that seems suspicious:

2022-12-25T00:27:09Z    DEBUG   Coordinator     invalid tip     {"parent": "46d774bc57d908751862a0f4f55452e3a2bd4aa0a90b7256142c29e6292af41a"}
shufps commented 1 year ago

We have the logs of all COOs and Tendercoos here (retention time 7 days):

https://elk.iota.org/app/discover#/view/efa803b0-8427-11ed-8e5d-3d731f215980?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3A'2022-12-24T22%3A40%3A56.603Z'%2Cto%3A'2022-12-25T02%3A05%3A08.490Z'))

(Please ask me for credentials to access the logging server)

shufps commented 1 year ago

invalid proposal: already applied also happened here: https://github.com/iotaledger/inx-tendercoo/issues/47

shufps commented 1 year ago

I also have pprof profiles for each COO and inx-tendercoo if needed (except the inx-tendercoo on coo-2 that hung in the node is not synced loop).

Wollac commented 1 year ago

OK, the issue is the following:

One important improvement on the tendercoo side must be to also check !meta.Solid || meta.ReferencedByMilestoneIndex > 0 || meta.ShouldReattach when proposing so that no invalid parents can get proposed. In that case at least a restart or the re-triggering should fix the state. I will integrate this.

However, this only fixes the symptoms. We would need to understand why !meta.Solid || meta.ReferencedByMilestoneIndex > 0 || meta.ShouldReattach was true for a selected tip! I was strongly assumed that this can never happen. Maybe @alexsporn or @muXxer can help out here, with what exactly those flags mean and how this could have happened for the Hornets.

muXxer commented 1 year ago

See: https://github.com/iotaledger/inx-tendercoo/pull/48#discussion_r1061335629

shufps commented 1 year ago

seems to be fixed.