Open sachikoy opened 8 years ago
it looks like vp3 has not done state transfer yet. State transfer is triggered by a checkpoint event from the other peers and checkpoint is in turn controlled by CORE_PBFT_GENERAL_K in the ./consensus/obcpbft/config.yaml file.
An example of the flow is in the behave test in file ./bddtests/peer_basic.feature
( search for string "#680") . We run this test through all 3 PBFT variations batch/classic/sieve and do a similar flow to what you are doing:
This should be addressed by #1000 and related PRs.
This should be fixed in latest master. Please confirm and close if you can no longer reproduce this bug.
@corecode I confirmed that the issue is reproduced on the latest code, with the commit level - 80337e286d9c3a910346d3858e84e0729dc71e51 (this should be the version after the merge of PR #1325)
Oh I see. The crashed peer will eventually catch up. You are not guaranteed to always read the most recent state on every peer. To verify that, keep submitting transactions (say, 100).
@corecode I tried 200 transactions but still the peers do not sync. Checkpoint period K=10 in hyperledger/fabric/consensus/obcpbft/config.yaml. BTW, security & privacy are enabled on our environment, if that would make a difference.
@sachikoy could you run this behave test ?
in directory ./fabric/bddtests, run behave -n "#680" -D logs-y
@tuand27613 behave failed with an error. I think it is because there is no file "../peer/peer".
Exception OSError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "/usr/local/bin/behave", line 11, in
Do you have the docker images for peer and membersrvc ? You might have to do
before behave can run.
I am seeing when running behave #680 where the stopped and re-started peer is not catching up.
Looking at logs now. It's actually running sieve but I think the logic is pretty similar to the batch case.
Could you still try to see if you can run behave #680 in your environment ?
it turns out that the error I ran into with behave #680 is because state transfer has not completed when I issued the query. I added a small wait period between the last invoke operation and the query and I'm now able to run the test multiple times with no issue.
@sachikoy , I would still like to see if you're able to run the behave test.
@tuand27613 I could not run the command "go test github.com/hyperledger/fabric/core/container -run=BuildImage_Peer" and get the error as below. I think it is because the files are not cloned from github but bulk copied. --- FAIL: TestVM_BuildImage_Peer (8.85s) vm_test.go:79: Error building Peer container: Tag latest not found in repository docker.io/openblockchain/baseimage
@mrshah-at-ibm can you help run the behave test on our cloud environment?
@sachikoy the reason you get the above error is you need to run the script that provisions the docker image first. Try "./scripts/provision/docker 0.0.9" prior to running the "go test .." command you mention and I think you will see that it works. FWIW, we are working on a way that will automate all of this for you and will hopefully have that available shortly. For now, those are the manual steps.
Hi, I have been working with @sachikoy recently, I have double checked this problem. If I test the code base 1 week before or 2 weeks or so, bddtest failed with message.
Assertion Failed: For attribute OK, expected (20), instead found (30)
I have pulled the latest code (as of May/10th), I see no out of sync status once I put some transactions among peer network. However, If I just use "docker-compose-4.yml" file ( nosecurity/noop consensus), I still see the gap in chaincode height as follows:
$ sh check-height.sh
curl -s http://172.17.0.2:5000/chain | | jq '.height'
27
curl -s http://172.17.0.3:5000/chain| jq '.height'
27
curl -s http://172.17.0.4:5000/chain | jq '.height'
21
curl -s http://172.17.0.5:5000/chain | | jq '.height'
27
# execute 6 another transaction, and check height.
curl -s http://172.17.0.2:5000/chain | | jq '.height'
33
curl -s http://172.17.0.3:5000/chain | | jq '.height'
32
curl -s http://172.17.0.4:5000/chain | | jq '.height'
26
curl -s http://172.17.0.5:5000/chain | | jq '.height'
32
Not sure if ledger synchronizations are supported in case of noop, nosecurity, but I think we should support this configuration too (for the purpose of testing chaincode).
@ibmamnt noops was meant to be a very basic component to allow a new developer to start up quickly. If you want to test chaincode only, I would run in a setup as described in SandboxSetup.md
Also, just to confirm, are you and @sachikoy no longer seeing the problem with batch/security on with the latest commit ?
@tuand27613 I still observe the issue. Has anybody else tried with batch/security enabled?
we've been running the behave tests including the #680 test case with every pull request. @jyellick is looking at a possible state transfer bug now. I will link to it and try to reproduce your scenario.
@tuand27613 The state transfer bug should be being fixed via PR #1445 . The #680 test case runs reliably and successfully for me, and appears to in Travis as well.
@sachikoy can you verify if this bug is resolved? Also, can you run this test with 5 peers?
@bmos299 , my colleague @ibmamnt tried with PBFT-batch and confirmed that the problem still occurs.
@bmos299 I confirmed the the problem still exists. I have test code so that you can verify the defect. Please contact me so that we can proceed the fix.
Please supply full debug logs for all peers.
@ibmamnt please send me any artifacts you have and we will recreate the issue. Thanks.
@ibmamnt I have the files and we will work locally on this as well. Thank you.
@tuand27613 @jyellick should check our HELLO processing since it sends along the blockchain info so that a peer may determine whether it is behind or not
message BlockchainInfo {
uint64 height = 1;
bytes currentBlockHash = 2;
bytes previousBlockHash = 3;
}
@binhn We could make some ad-hoc inferences using the HELLO message, but, the problem is that we are not guaranteed to get f+1 matching replies from HELLO (the blockchain is constantly changing height, so we could get heights of 999,1000,1001 from vps 0,1,2 for instance, and this would be perfectly normal. Rather than hoping for matching replies in the HELLO, I think it makes more sense to focus on more bulletproof solutions.
There are two scenarios where, barring additional failure, a replica can get out of sync, and never catch up. These are issues #1454 and #1120. These require some slight PBFT protocol modification, and are on our radar. With the combination of the two, we should get 'eventual consistency' across all replicas.
With python scripts received, we are unable to reproduce the issue with PBFT batch mode
, however issue (inconsistent data across nodes) surfaced with PBFT sieve mode
.
Alternatively, we ran this scenario with our go scripts and observed the same issue when PBFT sieve mode used, but not seen with PBFT batch mode.
With the code base on May/27th, I saw the same behaviour. That is, I did not see issue in PBFT/batch, but saw the inconsistency in PBFT/sieve mode.
The code based used to for this test.
$ git rev-parse HEAD a11ce403cbad1f5cd90bd807de0c65eaf226f9a3
By the way, I noticed several changes in pbft/batch code when I look at git log between May/19 - May/27, this may be related (or resolved indirectly).
Are you testing with the PR referenced above? It is hopefully close to being merged, and queries will fail, rather than return known stale data once it is included.
Please also be aware, testing with Sieve is of limited value right now. We are attempting to harden the core PBFT (in particular, the PBFT batch mode), so any bugs found in Sieve are not a priority.
@tuand27613 I have finally had time to set up a separate set of nodes on vagrant/docker, and ran the behave test: behave -n "#680" -D logs-y
Here is the result.
Decomposing with yaml 'docker-compose-4-consensus-sieve.yml' after scenario chaincode example02 with 4 peers and 1 membersrvc, issue #680 (State transfer) -- @1.3 Consensus Options,
Feature: utxo # utxo.feature:11
As an open chain developer
I want to be able to launch a 3 peers
1 feature passed, 0 failed, 1 skipped
3 scenarios passed, 0 failed, 28 skipped
81 steps passed, 0 failed, 541 skipped, 0 undefined
Took 6m58.671s
Does this mean there is no problem? However, I still have the same problem when invoking transactions via REST API. (security=true, privacy=true, pbft-single)
Here are the steps:
Here is log of vp0 and vp1 on step 9. vp0:
12:32:47.933 [rest] ProcessChaincode -> INFO 10f REST processing chaincode request...
12:32:47.933 [rest] processChaincodeInvokeOrQuery -> INFO 110 REST invoke chain code..
12:32:47.934 [rest] processChaincodeInvokeOrQuery -> INFO 111 Local user 'test_user0' is already logged in. Retrieving login token.
12:32:47.934 [crypto] invokeOrQuery -> INFO 112 Initializing client [test_user0]...
12:32:48.066 [crypto] invokeOrQuery -> INFO 113 Initializing client [test_user0]...done!
12:32:48.068 [consensus/obcpbft] RecvMsg -> INFO 114 New consensus request received
12:32:48.069 [crypto] CloseClient -> INFO 115 Closing client [test_user0]...
12:32:48.072 [consensus/obcpbft] executeOne -> INFO 116 Replica 0 executing/committing request for view=0/seqNo=27 and digest U3LGASgbEPomgflzJ64q9dsgrUNGhQmqPQvrhNA20H3N1UoOywxev8aV9obMrCHtGNkz1sXqNk5mcqYdl4dnJw==
12:32:48.091 [consensus/obcpbft] loop -> WARN 117 Attempting to stop an unfired idle timer
12:32:48.106 [rest] processChaincodeInvokeOrQuery -> INFO 118 Successfully submitted invoke transaction with txuuid (10cdfb3b-f97d-4bd6-9f71-0517211f5a68)
12:32:48.114 [rest] ProcessChaincode -> INFO 119 REST successfully submitted invoke transaction: {"jsonrpc":"2.0","result":{"status":"OK","message":"10cdfb3b-f97d-4bd6-9f71-0517211f5a68"},"id":3}
12:32:48.144 [consensus/obcpbft] execDoneSync -> INFO 11a Replica 0 finished execution 27, trying next
vp1:
12:32:48.082 [consensus/obcpbft] loop -> WARN 02d Attempting to stop an unfired idle timer
Here is log of vp0 and vp1 on step 10. vp0:
12:33:07.841 [consensus/obcpbft] executeOne -> INFO 11b Replica 0 executing/committing request for view=0/seqNo=28 and digest BLV9PRG6O2ADh17yVlKkTzkorJC1MPUiVgckzb7s996OfR/M727qKfg4pJHI/jVVeRrAozzZTv36gyGK6A/BsQ==
12:33:07.848 [consensus/obcpbft] loop -> WARN 11c Attempting to stop an unfired idle timer
12:33:07.890 [consensus/obcpbft] execDoneSync -> INFO 11d Replica 0 finished execution 28, trying next
vp1:
12:33:07.736 [rest] ProcessChaincode -> INFO 02e REST processing chaincode request...
12:33:07.737 [rest] processChaincodeInvokeOrQuery -> INFO 02f REST invoke chain code..
12:33:07.737 [rest] processChaincodeInvokeOrQuery -> INFO 030 Local user 'test_user1' is already logged in. Retrieving login token.
12:33:07.737 [crypto] invokeOrQuery -> INFO 031 Initializing client [test_user1]...
12:33:07.834 [crypto] invokeOrQuery -> INFO 032 Initializing client [test_user1]...done!
12:33:07.836 [consensus/obcpbft] RecvMsg -> INFO 033 New consensus request received
12:33:07.849 [crypto] CloseClient -> INFO 034 Closing client [test_user1]...
12:33:07.865 [rest] processChaincodeInvokeOrQuery -> INFO 035 Successfully submitted invoke transaction with txuuid (de6772fb-0ce5-4676-b73a-ad9f4be09bd5)
12:33:07.865 [rest] ProcessChaincode -> INFO 036 REST successfully submitted invoke transaction: {"jsonrpc":"2.0","result":{"status":"OK","message":"de6772fb-0ce5-4676-b73a-ad9f4be09bd5"},"id":3}
12:33:07.867 [consensus/obcpbft] loop -> WARN 037 Attempting to stop an unfired idle timer
You cannot expect all peers to be in the same state. For now, any N-F (3 if N=4) peers should have the same ledger level. To test, stop vp2 after your (9). vp1 should catch up as you submit invoke transactions.
@sachikoy , behave#680 ran successfully so it looks like state transfer is working normally. Note though that the test waits for the restarted peer to complete state transfer (https://github.com/hyperledger/fabric/blob/master/bddtests/peer_basic.feature#L506) before querying the block height.
As @corecode mentioned, in an environment where there is a continuous stream of transactions, some number of peers may lag whether due to network latency/processor speed/state transfer, etc ...
Also, please take a look at @jyellick 's PR 1557
From talking with @mrshah-at-ibm , the failing scenario is :
12-12-12-12
2.3. Bring down P4 and execute some transactions. Example block height as follows:
18-18-18-NA
2.4. Bring P4 back up. It will not catch up (F=1).
18-18-18-12
2.5. Run a transaction on P4.
18-18-18-13
2.6. Now that P4's state is different (maybe, we consider this as forked chain).
2.7 Bring P3 down, and the network is unusable
18-18-NA-13
@tuand27613 that is exactly what happening to us. Were you able to reproduce the problem?
@sachikoy , @mrshah-at-ibm is almost done replicating your environment and we'll re-run both #1331 and #1545.
@ratnakar-asara has been running both the behave 680 test case and your script but we're not seeing the issue on his machine yet.
@tuand27613 , As discussed when we check for ledger data which is same across nodes, However I see height is inconsistent across nodes. Attached go toolskit logs and container logs for 3 iterations #1331_Jun6.zip
@sachiko, we ( @ratnakar-asara and I ) have tried to run the script that you provided and we cannot make it fail.
as you can see from the output log, the chain height is consistent across all peers even after peer2 is stopped and restarted. I have the script multiple times with the same result.
Can you check if the script is running the scenario that you expect ? If so, can you re-run and attach the debug logs for all 4 peers ?
In the mean time, I will modify your script and run it with Mihir's duplicate environment.
@tuand27613 The log looks correct. Can you give me the script you used? (I can't recall that I gave you a script... I think it is probably made by @ibmamnt )
@tuand27613 I have shared my script with @sachikoy . My environment setup was basically the same as bddtest, and realized this is different from originally reported environment. I have modified these two parameters to match the Sachiko-san's env.
vpBatch:
extends:
service: vpBase
environment:
- CORE_PEER_VALIDATOR_CONSENSUS_PLUGIN=pbft
- CORE_PBFT_GENERAL_TIMEOUT_REQUEST=2s
- CORE_PBFT_GENERAL_MODE=batch
# TODO: This is used for testing as to assure deployment goes through to block
- CORE_PBFT_GENERAL_BATCHSIZE=10
Note about the change in TIMEOUT_REQUEST and BATCHSIZE values (2s and 10). And now, I was able to reproduce the problem. Please see the log file.
And here is my environment commit level.
$ git rev-parse HEAD
7ba04d1c8418b0dbc5e6842d0b42b1fa0b6e136f
For just in case, I waited for 30 minutes to see if it makes any difference. The height is different.
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
447
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
443
N-f replicas have the same height. This is expected and is not a bug.
@ibmamnt @sachikoy I re-ran your script with batchsize=10. I kept timeout_request=2s because the deploy on my machine is slow enough that the timeout timer was firing and pbft thought something was wrong and all the peers started to do view-changes.
In any case, I'm seeing consistent results 1331-2.txt
I think the differences in your runs and mine are due to environment variations in processors/network and what not. As @corecode mentioned, some subset will lag and that is expected for PBFT
@tuand27613 If it is possible, I can send my environment so that you can take a look. @corecode. If this is correct behavior, how to fix the problem ? I mean, when the machine is crashed, what the correct process to recover so that crashed peer can join again ? Looks like just restart of peer is not enough.
The peer is connected and is updating. You can test this by crashing a different peer and keep submitting transactions. The first crashed peer will have the same block height as the rest. It is absolutely natural that F out of N peers are slightly behind. The network cannot wait for all N peers, but only for N-F peers.
You could enable null requests in the consensus config.yaml
, which will periodically produce some noop traffic in the consensus network. This will make the slightly lagged behind peer to finally catch up. However, this is not necessary in a (constantly) busy network, where requests are being processed all the time.
@ibmamnt , you can verify by updating your script to do as @corecode describes as this is the fastest way to get the test case going.
Another possible way would be to decrease CORE_PBFT_GENERAL_K. This controls how often checkpoint messages are being sent. In this case, K=10 and batchsize=10 so checkpoint messages are sent about every 100 invokes. An out-of-sync peer determines that it is out-of-sync when it receives checkpoint messages from the other peers and compares against its own "state". Using a smaller K will wake up lagging peers sooner but will also affect throughput as all the peers will now spend some extra amount of time processing checkpoint messages. Note that in the behave 680 test case, we have K=10 and batchsize=1 so we push in 10 invokes after restarting the peer to force state transfer.
Hi Thanks. I confirmed that with K*batchsize transaction which are set K=2 and batchsize=10 in my test, restarted peer catch up (the first 3 are startup, chaincode deploy and init I guess).
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
503
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503
However, after the catch up, restarted peer start not to catch up, while others are kept update like:
# Invoke 1 transaction
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503
# Invoke yet another 1 transaction
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
505
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503
So this means, restarted peers are always behind K*batchsize . I'll check to see if I do something on restarted peer, then it start to sync just like non-restarted peer.
Hi, when I run transaction against restarted peer (in this case 172.17.0.6), and then run test cases (against 172.17.0.3) . It started to sync. Thanks for the explanation. I fully understood.
$ sh check-height.sh
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
540
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
540
Considering the following scenario:
curl -s --connect-timeout 1 http://172.17.0.3:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.4:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.5:5000/chain
504
curl -s --connect-timeout 1 http://172.17.0.6:5000/chain
503
What should be the expected behavior when a transaction is executed against the last peer (172.17.0.6)?
a query transaction will return a slightly stale reply. invoke transactions should be processed as normal by the network.
I think now I understand how the current Hyperledger works. Can you verify if my understanding is correct?
@sachikoy Q1: If the network is given only a little time to settle down (this means when it is not under full workload), the other peers will catch up. Executing dummy transactions at low frequency will help. (I do not think this is currently done, but it may be useful.) Q2: That behavior occurs because the others want to make progress as fast as possible. With less load on the system, the peer may catch up faster. But, generally, this points to the need for adding a notion of flow control to the BFT-based broadcast. Something like that should be done, but AFAIK it has not been explored by anyone in research. Q3: No, peer 2 will catch up. It infers from peer 0 and 1 what is the correct ledger (by necessity, peers 0 and 1 are correct, since peer 3 failed, as the system assumption implies). This is the guarantee that the BFT model (with < N/3 faults) gives. We understand the need to synchronize all nodes after more time passes, but this is not yet implemented.
I am testing on an environment with 4 VPs (vp0, .. vp3) with PBFT batch and security enabled. When I stop one peer and restart it, the peer's state becomes inconsistent with other peers, yet all peers keep making consensus.
Here is the precise steps I did: