iotaledger / inx-tendercoo

INX-Tendercoo enables a committee of validators to operate as a distributed Coordinator using Tendermint Core BFT consensus.
Apache License 2.0
1 stars 0 forks source link

tendercoo crash during syncing of hornet #60

Closed shufps closed 1 year ago

shufps commented 1 year ago

I tested to stop hornet for a couple of minutes (about 3) and restarted both again.

It seems, the detection of a synced hornet doesn't work.

After tendercoo synced the tendermint blockchain, it crashed with:

{"module": "consensus", "err": "failed to query ComputeWhiteFlag: rpc error: code = Unavailable desc = failed to compute white flag mutations: parents not solid",

Restarting tendercoo on a synced hornet works without issues.

This are the logs: coo.log tendercoo.log

Wollac commented 1 year ago

In the logs we seed, that the actual syncing works as expected in Hornet and TenderCoo. Only a few milestones after we run into the issue that another validator has proposed a milestone parent which is not solid for us. The parent was issued while we were offline and it is as such only requested when the milestone is created/received. In the Hornet log we see that this requesting took about 4s and thus significantly longer than the configured request timeout of 2s. Even in local tests with only one missing parent I always saw longer request times than 2s (about 3s - 4s). If this is not a bug but the expected time it takes, I strongly recommend to increase the whiteFlagParentsSolidTimeout default to 5s or 10s in the TenderCoo as well as Hornet.

shufps commented 1 year ago

retried:

inx-tendercoo    | 2023-02-22T13:56:44Z  INFO    INX     Connecting to node and reading node configuration ...
inx-tendercoo    | 2023-02-22T13:56:44Z  INFO    INX     > retrying INX connection to node ...
inx-tendercoo    | 2023-02-22T13:56:45Z  INFO    INX     > retrying INX connection to node ...
inx-tendercoo    | 2023-02-22T13:56:46Z  INFO    INX     > retrying INX connection to node ...
inx-tendercoo    | 2023-02-22T13:56:47Z  INFO    INX     Reading node status ...
inx-tendercoo    | 2023-02-22T13:56:47Z  INFO    Coordinator     Providing Coordinator ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Providing Coordinator ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading core components ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading core components: INX ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading core components: Coordinator ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading core components: Shutdown ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading plugins ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Loading plugin: Profiling ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Executing core components ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Starting core component: INX ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Starting core component: Coordinator ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Starting core component: Shutdown ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Executing plugins ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Starting plugin: Profiling ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    App     Starting background workers ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Starting Decentralized Coordinator ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    INX     Starting NodeBridge ...
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Profiling       You can now access the profiling server using: http://0.0.0.0:6060/debug/pprof/
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Starting TangleListener ... done
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Found private validator {"keyFile": "tendermint/config/priv_validator_key.json", "stateFile": "tendermint/data/priv_validator_state.json"}
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Found node key  {"path": "tendermint/config/node_key.json"}
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Found genesis file      {"path": "tendermint/config/genesis.json"}
inx-tendercoo    | 2023-02-22T13:56:48Z  INFO    Coordinator     Node appears to be connected
inx-tendercoo    | 2023-02-22T13:56:48Z  WARN    Coordinator     node is not synced; retrying in 2s
inx-tendercoo    | 2023-02-22T13:56:50Z  INFO    Coordinator     Node appears to be synced; latest=8221062 confirmed=8221062
inx-tendercoo    | 2023-02-22T13:56:50Z  INFO    Coordinator     Coordinator resumed     {"state": {"MilestoneHeight":3943404,"MilestoneIndex":8221045,"LastMilestoneID":"7414cf9d454b1a6085d6f17dfe982df95e796b364d7b68abf6acfbb4e3779a04","LastMilestoneBlockID":"8e95bb9d3bbad64b8b7923d10e545ae2236032214b121417ab689d8f5f8b2857"}}

worked perfectly :ok_hand:

shufps commented 1 year ago

retested with 10min downtime. Resuming worked perfectly.

Closing this issue.