Closed shufps closed 1 year ago
Hmm, the invalid proposal: already applied
also happened here: https://github.com/iotaledger/inx-tendercoo/issues/45
Unfortunately, this was to be expected. If the situation as described in https://github.com/iotaledger/inx-tendercoo/issues/45#issuecomment-1365812617 occurs, we have already written an invalid state to the Tendermint blockchain (all validator proposed invalid parents) and without resetting the blockchain there is no way to fix this.
An alternative would be to allow validators to propose more than one parent (and then always select the latest parent proposed) even though in a happy world it should never happen to propose more than one, we could consider this to make the design more robust (although it would probably allow an malicious validator to spam proposals)...
The recovery works but this is weird:
Yesterday the "invalid state" messages started on a single Coo, then on a second, then it got more on other nodes and now it seems they are getting less frequent.
Before the first "invalid state" there are regularely "submit failed but block is already present" but they are okay and no error.
That is actually an interesting one and maybe/probably not related. The DeCoo tries to re-broadcast failed Txs after some time. There is just an overflow protection but when only a single tx fails it'll retry more or less indefinitely and an "Invalid State" will always stay invalid. Probably this re-trying policy needs to be changed, but it would still be interesting what initially triggered this... Do we still have the logs from back then?
seems to be fixed
When restarting all tendercoos after the latest problems, there are logs like:
After the last line there are no further lines containing
Coordinator
.Not even a couple of minutes ... It seems something is crashing silently in the Coordinator and never recovering.
It's not clear if this error normally shouldn't happen at all or there should be some way of cleanly recover from this :man_shrugging: