hyperledger-archives / fabric

THIS IS A READ-ONLY historic repository. Current development is at https://gerrit.hyperledger.org/r/#/admin/projects/fabric . pull requests not accepted
https://gerrit.hyperledger.org/
Apache License 2.0
1.17k stars 1.01k forks source link

ledger data become inconsistent across nodes after restarting a peer (using PBFT batch) #1331

Open sachikoy opened 8 years ago

sachikoy commented 8 years ago

I am testing on an environment with 4 VPs (vp0, .. vp3) with PBFT batch and security enabled. When I stop one peer and restart it, the peer's state becomes inconsistent with other peers, yet all peers keep making consensus.

Here is the precise steps I did:

  1. invoke several transactions on all of peers (vp0, .., vp3)
  2. stop vp3, and make sure other nodes can still process transactions
  3. start vp3, and make sure that vp3 can process transactions (and all other peers)
  4. then vp3's the block height (from /chain REST API endpoint) is different from other peers.
  5. If I query the state, vp3's result is different from other peers.
tuand27613 commented 8 years ago

it looks like vp3 has not done state transfer yet. State transfer is triggered by a checkpoint event from the other peers and checkpoint is in turn controlled by CORE_PBFT_GENERAL_K in the ./consensus/obcpbft/config.yaml file.

An example of the flow is in the behave test in file ./bddtests/peer_basic.feature ( search for string "#680") . We run this test through all 3 PBFT variations batch/classic/sieve and do a similar flow to what you are doing:

corecode commented 8 years ago

This should be addressed by #1000 and related PRs.

corecode commented 8 years ago

This should be fixed in latest master. Please confirm and close if you can no longer reproduce this bug.

sachikoy commented 8 years ago

@corecode I confirmed that the issue is reproduced on the latest code, with the commit level - 80337e286d9c3a910346d3858e84e0729dc71e51 (this should be the version after the merge of PR #1325)

corecode commented 8 years ago

Oh I see. The crashed peer will eventually catch up. You are not guaranteed to always read the most recent state on every peer. To verify that, keep submitting transactions (say, 100).

sachikoy commented 8 years ago

@corecode I tried 200 transactions but still the peers do not sync. Checkpoint period K=10 in hyperledger/fabric/consensus/obcpbft/config.yaml. BTW, security & privacy are enabled on our environment, if that would make a difference.

tuand27613 commented 8 years ago

@sachikoy could you run this behave test ? in directory ./fabric/bddtests, run behave -n "#680" -D logs-y

sachikoy commented 8 years ago

@tuand27613 behave failed with an error. I think it is because there is no file "../peer/peer".

behave -n \"#680" -D logs-y

Exception OSError: [Errno 2] No such file or directory Traceback (most recent call last): File "/usr/local/bin/behave", line 11, in sys.exit(main()) File "/usr/local/lib/python2.7/dist-packages/behave/main.py", line 109, in main failed = runner.run() File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 672, in run return self.run_with_paths() File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 693, in run_with_paths return self.run_model() File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 471, in run_model self.run_hook('before_all', context) File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 405, in run_hook self.hooks[name](context, *args) File "environment.py", line 47, in before_all cli_call(context, ["../peer/peer", "stop"], expect_success=False) File "/go/src/github.com/hyperledger/fabric/bddtests/steps/bdd_test_util.py", line 19, in cli_call p = subprocess.Popen(arg_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

tuand27613 commented 8 years ago

Do you have the docker images for peer and membersrvc ? You might have to do

before behave can run.

tuand27613 commented 8 years ago

I am seeing when running behave #680 where the stopped and re-started peer is not catching up.

Looking at logs now. It's actually running sieve but I think the logic is pretty similar to the batch case.

Could you still try to see if you can run behave #680 in your environment ?

tuand27613 commented 8 years ago

it turns out that the error I ran into with behave #680 is because state transfer has not completed when I issued the query. I added a small wait period between the last invoke operation and the query and I'm now able to run the test multiple times with no issue.

@sachikoy , I would still like to see if you're able to run the behave test.

sachikoy commented 8 years ago

@tuand27613 I could not run the command "go test github.com/hyperledger/fabric/core/container -run=BuildImage_Peer" and get the error as below. I think it is because the files are not cloned from github but bulk copied. --- FAIL: TestVM_BuildImage_Peer (8.85s) vm_test.go:79: Error building Peer container: Tag latest not found in repository docker.io/openblockchain/baseimage

@mrshah-at-ibm can you help run the behave test on our cloud environment?

ghaskins commented 8 years ago

@sachikoy the reason you get the above error is you need to run the script that provisions the docker image first. Try "./scripts/provision/docker 0.0.9" prior to running the "go test .." command you mention and I think you will see that it works. FWIW, we are working on a way that will automate all of this for you and will hopefully have that available shortly. For now, those are the manual steps.

ibmamnt commented 8 years ago

Hi, I have been working with @sachikoy recently, I have double checked this problem. If I test the code base 1 week before or 2 weeks or so, bddtest failed with message.

Assertion Failed: For attribute OK, expected (20), instead found (30)

I have pulled the latest code (as of May/10th), I see no out of sync status once I put some transactions among peer network. However, If I just use "docker-compose-4.yml" file ( nosecurity/noop consensus), I still see the gap in chaincode height as follows:

$ sh check-height.sh
curl -s http://172.17.0.2:5000/chain | | jq '.height'
27
curl -s http://172.17.0.3:5000/chain|  jq '.height'
27
curl -s http://172.17.0.4:5000/chain | jq '.height'
21
curl -s http://172.17.0.5:5000/chain | | jq '.height'
27
# execute 6 another transaction, and check height.
curl -s http://172.17.0.2:5000/chain | | jq '.height'
33
curl -s http://172.17.0.3:5000/chain | | jq '.height'
32
curl -s http://172.17.0.4:5000/chain | | jq '.height'
26
curl -s http://172.17.0.5:5000/chain | | jq '.height'
32

Not sure if ledger synchronizations are supported in case of noop, nosecurity, but I think we should support this configuration too (for the purpose of testing chaincode).

tuand27613 commented 8 years ago

@ibmamnt noops was meant to be a very basic component to allow a new developer to start up quickly. If you want to test chaincode only, I would run in a setup as described in SandboxSetup.md

Also, just to confirm, are you and @sachikoy no longer seeing the problem with batch/security on with the latest commit ?

sachikoy commented 8 years ago

@tuand27613 I still observe the issue. Has anybody else tried with batch/security enabled?

tuand27613 commented 8 years ago

we've been running the behave tests including the #680 test case with every pull request. @jyellick is looking at a possible state transfer bug now. I will link to it and try to reproduce your scenario.

jyellick commented 8 years ago

@tuand27613 The state transfer bug should be being fixed via PR #1445 . The #680 test case runs reliably and successfully for me, and appears to in Travis as well.

bmos299 commented 8 years ago

@sachikoy can you verify if this bug is resolved? Also, can you run this test with 5 peers?

sachikoy commented 8 years ago

@bmos299 , my colleague @ibmamnt tried with PBFT-batch and confirmed that the problem still occurs.

ibmamnt commented 8 years ago

@bmos299 I confirmed the the problem still exists. I have test code so that you can verify the defect. Please contact me so that we can proceed the fix.

corecode commented 8 years ago

Please supply full debug logs for all peers.

bmos299 commented 8 years ago

@ibmamnt please send me any artifacts you have and we will recreate the issue. Thanks.

bmos299 commented 8 years ago

@ibmamnt I have the files and we will work locally on this as well. Thank you.

binhn commented 8 years ago

@tuand27613 @jyellick should check our HELLO processing since it sends along the blockchain info so that a peer may determine whether it is behind or not

message BlockchainInfo {
    uint64 height = 1;
    bytes currentBlockHash = 2;
    bytes previousBlockHash = 3;
}
jyellick commented 8 years ago

@binhn We could make some ad-hoc inferences using the HELLO message, but, the problem is that we are not guaranteed to get f+1 matching replies from HELLO (the blockchain is constantly changing height, so we could get heights of 999,1000,1001 from vps 0,1,2 for instance, and this would be perfectly normal. Rather than hoping for matching replies in the HELLO, I think it makes more sense to focus on more bulletproof solutions.

There are two scenarios where, barring additional failure, a replica can get out of sync, and never catch up. These are issues #1454 and #1120. These require some slight PBFT protocol modification, and are on our radar. With the combination of the two, we should get 'eventual consistency' across all replicas.

ratnakar-asara commented 8 years ago

With python scripts received, we are unable to reproduce the issue with PBFT batch mode, however issue (inconsistent data across nodes) surfaced with PBFT sieve mode. Alternatively, we ran this scenario with our go scripts and observed the same issue when PBFT sieve mode used, but not seen with PBFT batch mode.

ibmamnt commented 8 years ago

With the code base on May/27th, I saw the same behaviour. That is, I did not see issue in PBFT/batch, but saw the inconsistency in PBFT/sieve mode.

The code based used to for this test.

$ git rev-parse HEAD
a11ce403cbad1f5cd90bd807de0c65eaf226f9a3

By the way, I noticed several changes in pbft/batch code when I look at git log between May/19 - May/27, this may be related (or resolved indirectly).

jyellick commented 8 years ago

Are you testing with the PR referenced above? It is hopefully close to being merged, and queries will fail, rather than return known stale data once it is included.

Please also be aware, testing with Sieve is of limited value right now. We are attempting to harden the core PBFT (in particular, the PBFT batch mode), so any bugs found in Sieve are not a priority.

sachikoy commented 8 years ago

@tuand27613 I have finally had time to set up a separate set of nodes on vagrant/docker, and ran the behave test: behave -n "#680" -D logs-y

Here is the result.

Decomposing with yaml 'docker-compose-4-consensus-sieve.yml' after scenario chaincode example02 with 4 peers and 1 membersrvc, issue #680 (State transfer) -- @1.3 Consensus Options, 

Feature: utxo # utxo.feature:11
  As an open chain developer
  I want to be able to launch a 3 peers
1 feature passed, 0 failed, 1 skipped
3 scenarios passed, 0 failed, 28 skipped
81 steps passed, 0 failed, 541 skipped, 0 undefined
Took 6m58.671s

Does this mean there is no problem? However, I still have the same problem when invoking transactions via REST API. (security=true, privacy=true, pbft-single)

Here are the steps:

  1. start membersrvc docker container
  2. start vp0 to vp3 docker containers
  3. login to vp0 REST API as test_user0
  4. deploy chaincode_example2 to vp0
  5. invoke transactions a few times on vp0 REST API
  6. make sure that chain height is the same for al peers, and query returns the same result.
  7. stop vp1 (docker stop vp1)
  8. invoke more transactions on vp0
  9. start vp1 (docker start vp1)
  10. invoke transactions on vp0 REST API, and check the chain height of vp0 and vp1. Then vp1's height is not increased at all.
  11. invoke transactions on vp1 REST API, and then check the chain height of vp0 and vp1. Then only vp0's chain height is increased.

Here is log of vp0 and vp1 on step 9. vp0:

12:32:47.933 [rest] ProcessChaincode -> INFO 10f REST processing chaincode request...
12:32:47.933 [rest] processChaincodeInvokeOrQuery -> INFO 110 REST invoke chain code..
12:32:47.934 [rest] processChaincodeInvokeOrQuery -> INFO 111 Local user 'test_user0' is already logged in. Retrieving login token.
12:32:47.934 [crypto] invokeOrQuery -> INFO 112 Initializing client [test_user0]...
12:32:48.066 [crypto] invokeOrQuery -> INFO 113 Initializing client [test_user0]...done!
12:32:48.068 [consensus/obcpbft] RecvMsg -> INFO 114 New consensus request received
12:32:48.069 [crypto] CloseClient -> INFO 115 Closing client [test_user0]...
12:32:48.072 [consensus/obcpbft] executeOne -> INFO 116 Replica 0 executing/committing request for view=0/seqNo=27 and digest U3LGASgbEPomgflzJ64q9dsgrUNGhQmqPQvrhNA20H3N1UoOywxev8aV9obMrCHtGNkz1sXqNk5mcqYdl4dnJw==
12:32:48.091 [consensus/obcpbft] loop -> WARN 117 Attempting to stop an unfired idle timer
12:32:48.106 [rest] processChaincodeInvokeOrQuery -> INFO 118 Successfully submitted invoke transaction with txuuid (10cdfb3b-f97d-4bd6-9f71-0517211f5a68)
12:32:48.114 [rest] ProcessChaincode -> INFO 119 REST successfully submitted invoke transaction: {"jsonrpc":"2.0","result":{"status":"OK","message":"10cdfb3b-f97d-4bd6-9f71-0517211f5a68"},"id":3}
12:32:48.144 [consensus/obcpbft] execDoneSync -> INFO 11a Replica 0 finished execution 27, trying next

vp1:

12:32:48.082 [consensus/obcpbft] loop -> WARN 02d Attempting to stop an unfired idle timer

Here is log of vp0 and vp1 on step 10. vp0:

12:33:07.841 [consensus/obcpbft] executeOne -> INFO 11b Replica 0 executing/committing request for view=0/seqNo=28 and digest BLV9PRG6O2ADh17yVlKkTzkorJC1MPUiVgckzb7s996OfR/M727qKfg4pJHI/jVVeRrAozzZTv36gyGK6A/BsQ==
12:33:07.848 [consensus/obcpbft] loop -> WARN 11c Attempting to stop an unfired idle timer
12:33:07.890 [consensus/obcpbft] execDoneSync -> INFO 11d Replica 0 finished execution 28, trying next

vp1:

12:33:07.736 [rest] ProcessChaincode -> INFO 02e REST processing chaincode request...
12:33:07.737 [rest] processChaincodeInvokeOrQuery -> INFO 02f REST invoke chain code..
12:33:07.737 [rest] processChaincodeInvokeOrQuery -> INFO 030 Local user 'test_user1' is already logged in. Retrieving login token.
12:33:07.737 [crypto] invokeOrQuery -> INFO 031 Initializing client [test_user1]...
12:33:07.834 [crypto] invokeOrQuery -> INFO 032 Initializing client [test_user1]...done!
12:33:07.836 [consensus/obcpbft] RecvMsg -> INFO 033 New consensus request received
12:33:07.849 [crypto] CloseClient -> INFO 034 Closing client [test_user1]...
12:33:07.865 [rest] processChaincodeInvokeOrQuery -> INFO 035 Successfully submitted invoke transaction with txuuid (de6772fb-0ce5-4676-b73a-ad9f4be09bd5)
12:33:07.865 [rest] ProcessChaincode -> INFO 036 REST successfully submitted invoke transaction: {"jsonrpc":"2.0","result":{"status":"OK","message":"de6772fb-0ce5-4676-b73a-ad9f4be09bd5"},"id":3}
12:33:07.867 [consensus/obcpbft] loop -> WARN 037 Attempting to stop an unfired idle timer
corecode commented 8 years ago

You cannot expect all peers to be in the same state. For now, any N-F (3 if N=4) peers should have the same ledger level. To test, stop vp2 after your (9). vp1 should catch up as you submit invoke transactions.

tuand27613 commented 8 years ago

@sachikoy , behave#680 ran successfully so it looks like state transfer is working normally. Note though that the test waits for the restarted peer to complete state transfer (https://github.com/hyperledger/fabric/blob/master/bddtests/peer_basic.feature#L506) before querying the block height.

As @corecode mentioned, in an environment where there is a continuous stream of transactions, some number of peers may lag whether due to network latency/processor speed/state transfer, etc ...

Also, please take a look at @jyellick 's PR 1557

tuand27613 commented 8 years ago

From talking with @mrshah-at-ibm , the failing scenario is :

  1. In the screenshare session, following scenario was played: 2.1. Start a 4 peer network 2.2. Run a few transactions and reach stateHash on all peers. Example block heights as follows:
12-12-12-12

2.3. Bring down P4 and execute some transactions. Example block height as follows:

18-18-18-NA

2.4. Bring P4 back up. It will not catch up (F=1).

18-18-18-12

2.5. Run a transaction on P4.

18-18-18-13

2.6. Now that P4's state is different (maybe, we consider this as forked chain).

2.7 Bring P3 down, and the network is unusable

18-18-NA-13
sachikoy commented 8 years ago

@tuand27613 that is exactly what happening to us. Were you able to reproduce the problem?

tuand27613 commented 8 years ago

@sachikoy , @mrshah-at-ibm is almost done replicating your environment and we'll re-run both #1331 and #1545.

@ratnakar-asara has been running both the behave 680 test case and your script but we're not seeing the issue on his machine yet.

ratnakar-asara commented 8 years ago

@tuand27613 , As discussed when we check for ledger data which is same across nodes, However I see height is inconsistent across nodes. Attached go toolskit logs and container logs for 3 iterations #1331_Jun6.zip

tuand27613 commented 8 years ago

@sachiko, we ( @ratnakar-asara and I ) have tried to run the script that you provided and we cannot make it fail.

1331.txt

as you can see from the output log, the chain height is consistent across all peers even after peer2 is stopped and restarted. I have the script multiple times with the same result.

Can you check if the script is running the scenario that you expect ? If so, can you re-run and attach the debug logs for all 4 peers ?

In the mean time, I will modify your script and run it with Mihir's duplicate environment.

sachikoy commented 8 years ago

@tuand27613 The log looks correct. Can you give me the script you used? (I can't recall that I gave you a script... I think it is probably made by @ibmamnt )

ibmamnt commented 8 years ago

@tuand27613 I have shared my script with @sachikoy . My environment setup was basically the same as bddtest, and realized this is different from originally reported environment. I have modified these two parameters to match the Sachiko-san's env.

vpBatch:
  extends:
  service: vpBase
environment:
  - CORE_PEER_VALIDATOR_CONSENSUS_PLUGIN=pbft
  - CORE_PBFT_GENERAL_TIMEOUT_REQUEST=2s
  - CORE_PBFT_GENERAL_MODE=batch
  # TODO: This is used for testing as to assure deployment goes through to block
  - CORE_PBFT_GENERAL_BATCHSIZE=10

Note about the change in TIMEOUT_REQUEST and BATCHSIZE values (2s and 10). And now, I was able to reproduce the problem. Please see the log file.

ibmamnt-test-0608.txt

And here is my environment commit level.

$ git rev-parse HEAD 
7ba04d1c8418b0dbc5e6842d0b42b1fa0b6e136f

For just in case, I waited for 30 minutes to see if it makes any difference. The height is different.

$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
443
corecode commented 8 years ago

N-f replicas have the same height. This is expected and is not a bug.

tuand27613 commented 8 years ago

@ibmamnt @sachikoy I re-ran your script with batchsize=10. I kept timeout_request=2s because the deploy on my machine is slow enough that the timeout timer was firing and pbft thought something was wrong and all the peers started to do view-changes.

In any case, I'm seeing consistent results 1331-2.txt

I think the differences in your runs and mine are due to environment variations in processors/network and what not. As @corecode mentioned, some subset will lag and that is expected for PBFT

ibmamnt commented 8 years ago

@tuand27613 If it is possible, I can send my environment so that you can take a look. @corecode. If this is correct behavior, how to fix the problem ? I mean, when the machine is crashed, what the correct process to recover so that crashed peer can join again ? Looks like just restart of peer is not enough.

corecode commented 8 years ago

The peer is connected and is updating. You can test this by crashing a different peer and keep submitting transactions. The first crashed peer will have the same block height as the rest. It is absolutely natural that F out of N peers are slightly behind. The network cannot wait for all N peers, but only for N-F peers.

You could enable null requests in the consensus config.yaml, which will periodically produce some noop traffic in the consensus network. This will make the slightly lagged behind peer to finally catch up. However, this is not necessary in a (constantly) busy network, where requests are being processed all the time.

tuand27613 commented 8 years ago

@ibmamnt , you can verify by updating your script to do as @corecode describes as this is the fastest way to get the test case going.

Another possible way would be to decrease CORE_PBFT_GENERAL_K. This controls how often checkpoint messages are being sent. In this case, K=10 and batchsize=10 so checkpoint messages are sent about every 100 invokes. An out-of-sync peer determines that it is out-of-sync when it receives checkpoint messages from the other peers and compares against its own "state". Using a smaller K will wake up lagging peers sooner but will also affect throughput as all the peers will now spend some extra amount of time processing checkpoint messages. Note that in the behave 680 test case, we have K=10 and batchsize=1 so we push in 10 invokes after restarting the peer to force state transfer.

ibmamnt commented 8 years ago

Hi Thanks. I confirmed that with K*batchsize transaction which are set K=2 and batchsize=10 in my test, restarted peer catch up (the first 3 are startup, chaincode deploy and init I guess).

$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503

However, after the catch up, restarted peer start not to catch up, while others are kept update like:

# Invoke  1 transaction
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503

# Invoke yet another 1 transaction
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503

So this means, restarted peers are always behind K*batchsize . I'll check to see if I do something on restarted peer, then it start to sync just like non-restarted peer.

ibmamnt commented 8 years ago

Hi, when I run transaction against restarted peer (in this case 172.17.0.6), and then run test cases (against 172.17.0.3) . It started to sync. Thanks for the explanation. I fully understood.

$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
540
mrshah-at-ibm commented 8 years ago

Considering the following scenario:

curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503

What should be the expected behavior when a transaction is executed against the last peer (172.17.0.6)?

corecode commented 8 years ago

a query transaction will return a slightly stale reply. invoke transactions should be processed as normal by the network.

sachikoy commented 8 years ago

I think now I understand how the current Hyperledger works. Can you verify if my understanding is correct?

  1. When one peer (say peer 2) stopped and then started again, it tries to synchronise with the latest ledger after x transactions, where x = batch size * K, and K is defined as CORE_PBFT_GENERAL_K. And I assume PBFT-classic (single) is identical to batch size=0.
  2. Even after catching up, peer 2 may be slightly lagging behind, because Hyperledger allows at most (N-1)/3 to be in failed state.
  3. While peer 2 is lagging behind, any query transaction to peer 2 will return out dated information. An invoke transaction is executed on other peers (0,2, and 3) but not on peer 2, because peer 2 cannot make consensus with other peers.
  4. Then when we stop another peer (say peer 3), then peer 2 will catch up with peer 0 and peer 1 completely, because Hyperledger cannot allow more than (N-1)/3 nodes to be in failure.

Question

  1. How can we predict when the peer catches up? I used PBFT single + K=10, but it did not start catching up even after 10 transactions.
  2. Why can't we make a restarted peer to catch up with others as soon as possible, rather than let it lagging behind for a while? It seems the peer keeps lagging behind until another peer dies. This means that the application accessing the lagging peer always sees an out of date state.
  3. Related to question 2, is there possible problem caused by this design? The current design means that at some point in time (i.e., when step 4 above happens), only less than majority of peers have identical ledger, because one peer is dead and the other is barely catching up.
cca88 commented 8 years ago

@sachikoy Q1: If the network is given only a little time to settle down (this means when it is not under full workload), the other peers will catch up. Executing dummy transactions at low frequency will help. (I do not think this is currently done, but it may be useful.) Q2: That behavior occurs because the others want to make progress as fast as possible. With less load on the system, the peer may catch up faster. But, generally, this points to the need for adding a notion of flow control to the BFT-based broadcast. Something like that should be done, but AFAIK it has not been explored by anyone in research. Q3: No, peer 2 will catch up. It infers from peer 0 and 1 what is the correct ledger (by necessity, peers 0 and 1 are correct, since peer 3 failed, as the system assumption implies). This is the guarantee that the BFT model (with < N/3 faults) gives. We understand the need to synchronize all nodes after more time passes, but this is not yet implemented.