synechron-finlabs / quorum-maker

Utility to create and monitor Quorum nodes
Apache License 2.0
196 stars 97 forks source link

Nodes data not synced after restart? #55

Closed Eithcowich closed 5 years ago

Eithcowich commented 5 years ago

I'm running a 5 nodes setup, each on a different Ubuntu machine on Azure. They worked with no problems for a couple of weeks. Then today I noticed that 3 nodes were down.

I reactivated them by calling ./start.sh and they came back with no issue.

But I'm not sure the data on one of my contracts was fully restored. Is this possible?

Here are some related questions:

  1. Why would a node go down? Should I take care to restart it periodically?

  2. After a restart, all the data on all the contracts should be restored, right?

  3. If a node goes down and comes back, how does the sync process work?

  4. In my case 3 out of 5 nodes were down. The manager node was still up. How would the sync process work in such a case, to restore consensus?

  5. Where in the actual transaction data in the logs?

  6. Is there a script that monitors the nodes and send email in case a node goes down, or even better takes care to restore it?

Thanks!

dhyansraj commented 5 years ago

Please see the answers inline.

  1. Why would a node go down? Should I take care to restart it periodically? You havent mentioned if the docker containers were still running. We have observed Geth or Constellation goes down sporadically. Could be a stability issue there. This is something we have to get Quorum Team fixed. Have you tried same with 7 node example for few days?

  2. After a restart, all the data on all the contracts should be restored, right? public states should sync on restart. Currently constellation can not send offline private transactions. So if any of the nodes part of the privateFor were down, whole transactions would have failed. So Syncing shouldnt have been required.

  3. If a node goes down and comes back, how does the sync process work?

In my case 3 out of 5 nodes were down. The manager node was still up. How would the sync process work in such a case, to restore consensus?

Consensus is managed by Raft. Only nodes up are elected for the each round of vote. And there is no managers in Quorum. If you call the first node in the network, it is just for Quorum Maker.

Where in the actual transaction data in the logs? Check the gethLog directory in your node. According to the verbose level, it may or may not be logged. You can see the transaction data from web3 console or QM UI.

Is there a script that monitors the nodes and send email in case a node goes down, or even better takes care to restore it?

QM can send email notifications on node failures. Click on the email tab and configure your smtp server.

Eithcowich commented 5 years ago

Thank you for the answers.

This: Currently constellation can not send offline private transactions. So if any of the nodes part of the privateFor were down, whole transactions would have failed. So Syncing shouldnt have been required is alarming for our needs.

We have private data on contracts on all our nodes, and some nodes have data is that shared only with the manager node. How do I get this data restored?

Eithcowich commented 5 years ago

Answering my own question: I can confirm that taking a node down, and bringing it back up, doesn't cause any loss of data. As expected it's all there. This is true for any of the nodes regardless of privacy.

dhyansraj commented 5 years ago

For security reasons, transactions are never sent to non parties in Quorum. Also data is encrypted using public key of peer, so even when multiple parties are part of same transaction, data received is different on each node, so it is not possible to sync when another node comes online. A transaction is mined to the public state only if the participant nodes are online and confirm receipt. A node goes down after a successful private transaction is OK. And all public transactions can be done while any node is offline and will be synced when they come up as in Ethereum.

dhyansraj commented 5 years ago

This scenarios are specific to Quorum not QM, so I recommend you post in Quorum Slack, so you can get opinions from others as well.