hyperledger / indy-node

The server portion of a distributed ledger purpose-built for decentralized identity.
https://wiki.hyperledger.org/display/indy
Apache License 2.0
683 stars 655 forks source link

Node loses consensus, but does not regain it #1700

Open lynnbendixsen opened 3 years ago

lynnbendixsen commented 3 years ago

Note: Summary might need changed to add "for a very long time". I just saw a case that might be similar where it took 3 hours to regain consensus. (Will include more logs from Uphold if/when they arrive)

Environment: Indicio Indy Networks running indy-node version 1.12.4

Steps to replicate: (the following general behavior was seen in log files for 2 nodes that lost consensus on 2 different networks, a week apart.)

  1. Node loses connectivity to 3f+1 nodes (4 or 5 nodes, on Indicio networks) for 30 seconds or more.
  2. Nodes reconnect within 5 minutes.
  3. Node reports "Out of Consensus" and sends request for VIEW change every 5 minutes.
  4. Node does not return to consensus within 45 minutes (in anonyome node example), so node was restarted, at which point it did return to consensus and regained normal operation.

Expected Behavior: Node returns to consensus quickly after conditions which caused it to go out are restored.

Actual Behavior: Node did not return to consensus for a long time (at least 45 minutes in one case).

Notes: In Anonyome log, start at about 2021-09-17 15:32:53 to see above behaviour. OOC messages (reporting) started at about 2021-09-17 15:41:00 plus or minus 1 minute. In Opsnode-dn log, start at about 2021-09-10 01:59:15, and OOC occurred at about 2021-09-10 02:23:00. Uphold log does not display a similar pattern(only disconnected from one node, the primary, at 15:24:15), but it went out of consensus at about 2021-9-28 15:45:00 and returned to consensus about 3 hours later at about 18:43:00. This one is possibly unrelated, but also took a very long time to return to consensus. (Regained connection to primary at 2021-09-28 15:24:54)

lynnbendixsen commented 3 years ago

20210917-anonyome-validator-log.zip

lynnbendixsen commented 3 years ago

opsnode-dn-log.2.zip uphold.log.zip

WadeBarnes commented 1 year ago

@lynnbendixsen, is this still an issue. Any further insight?

lynnbendixsen commented 1 year ago

I have not seen this specific issue for a while, but I also don't allow nodes to stay out of consensus if I can help it. I don't know of any changes to indy-node that would have repaired the issue. The "rapid view change requests" part of this report still happens pretty regularly for me.