Open ynamiki opened 5 years ago
That would be good. It must not necessarily follow the architecture of BFT-SMaRt adapter because we use language compatible with Fabric code base. We should compare it with how Raft is integrated in Fabric.
Additionally, we might be interested in integration with HL Sawtooth.
I wrote some adapter code and a gist document to address this issue. So let me share.
I did the followings:
Unfortunately, I didn't have any appealing number yet (see the figure in the gist), so my next step is to investigate more about bottleneck and look for ideas about performance optimization.
I'm glad if I can have any feedback or advise.
(edited) The number of "query" of MinBFT is about as half as that of other consensuses, which looks odd to me and I might miss something important. I'll dig this more...
@Naoya-Horiguchi Great work! I think your performance test was done in the simulation mode. I will try it with a SGX-equipped machine.
Unfortunately, I give up.... I find that it is so much complicated task in my environment. The SGX-equipped machine is behind a corporate proxy with SSL inspection and I have to update all Dockerfiles to install a special certificate for SSL Inspection:anguished:.
The SGX-equipped machine is behind a corporate proxy
It might be easier to develop the integration building the enclave in simulation mode. Then there's no need in hardware SGX machines. Conceptually, integration should not depend on whether you use simulation of hardware SGX.
Sorry, I haven't found time to look at the code yet, but it's on my to-do list :wink:
@ynamiki, @sergefdrv thank you for comments. I admit that the code I'm developing is immature and need to be improved before reviewed in detail. Namiki-san's work on #115 for delivery improvement would help me very much.
The SGX-equipped machine is behind a corporate proxy with SSL inspection and I have to update all Dockerfiles to install a special certificate for SSL Inspectionanguished.
Yes, proxy often prevents us from easy testing, so if available you can use SGX-equipped VMs on Azure. But anyway, I think it's nice if I can provide the procedure properly working on any environment/SGX-mode.
BTW, I'm facing a fundamental issue when applying minbft to blockchain platform, so let me share about it. In short words, how do we guarantee Byzantine tolerance from the whole system's perspective? Even if the consensus component our project provides is proven to be Byzantine tolerant, that seems not directly mean the whole system is Byzantine tolerant.
For example in Hyperledger Fabric, a orderer creates a block candidate with submitted transactions and tries to make an agreement on it between other orderers. The orderer distributes the block candidate to all other orderers (as discussed in #119), then consensus process starts. However, what our consensus code actually does is just to check and agree on what the primary orderer suggests. So what if the primary orderer is Byzantine faulty and controls the contents of the block candidate on purpose? For example, the Byzantine faulty primary orderer could intentionally exclude some transactions from a specific user from inclusion to any block candidates, then the user can't properly use the system.
I have some ideas to handle this issue:
I currently feel that 2. is promising, but if you have any comment and/or any idea of alternative approach, that would be nice.
@Naoya-Horiguchi I still couldn't find time to look at the code, but let me just give some feedback to your concern.
BTW, I'm facing a fundamental issue when applying minbft to blockchain platform, so let me share about it. In short words, how do we guarantee Byzantine tolerance from the whole system's perspective? Even if the consensus component our project provides is proven to be Byzantine tolerant, that seems not directly mean the whole system is Byzantine tolerant.
In case of Hyperledger Fabric, the consensus mechanism of the ordering service is in general complemented with the chaincode endorsement policy to achieve a desired level of resilience.
For example in Hyperledger Fabric, a orderer creates a block candidate with submitted transactions and tries to make an agreement on it between other orderers. The orderer distributes the block candidate to all other orderers (as discussed in #119), then consensus process starts. However, what our consensus code actually does is just to check and agree on what the primary orderer suggests. So what if the primary orderer is Byzantine faulty and controls the contents of the block candidate on purpose? For example, the Byzantine faulty primary orderer could intentionally exclude some transactions from a specific user from inclusion to any block candidates, then the user can't properly use the system.
I think the consensus layer should be aware of individual transactions so that it can detect potential unfair treatment (censorship) from the primary. The consensus layer in the primary replica can take advantage of Fabric's functionality to form transaction batches (BlockCutter
interface). We will need to extend our MinBFT implementation and its interfaces to support this.
Furthermore, I think we will need to extend MinBFT so that it can provide a kind of consensus proof to RequestConsumer
interface. We can attach this proof as metadata to the delivered blocks. This consensus proof then should be checked by Fabric peers.
From the Fabric SDK side, the client, after broadcasting a request through an orderer node, should make sure it actually appears in the ledger. After some timeout, it can suspect the orderer node is faulty and re-broadcast the transaction via another node.
Thanks for the feedback, @sergefdrv.
In case of Hyperledger Fabric, the consensus mechanism of the ordering service is in general complemented with the chaincode endorsement policy to achieve a desired level of resilience.
Yes, endorsement part focuses on generating approvals that a suggested transaction is considered valid by checking the result of the execution from a right set of endorsing peers. This part handles each transaction independently. Then ordering service focuses on ordering the endorsed transactions.
The point is that each orderer receives different set of transactions, so if we want consensus layer to handle individual transactions (or to check that other orderers handle transactions in fair manner), we need share the information of all submitted transactions among all orderers. This slows down consensus, but could be a solution if no other approach.
From the Fabric SDK side, the client, after broadcasting a request through an orderer node, should make sure it actually appears in the ledger. After some timeout, it can suspect the orderer node is faulty and re-broadcast the transaction via another node.
Retrying with different orderer is fine to me. Maybe we should care about potential race between retry and commit to avoid committing a transaction twice in different blocks.
The point is that each orderer receives different set of transactions, so if we want consensus layer to handle individual transactions (or to check that other orderers handle transactions in fair manner), we need share the information of all submitted transactions among all orderers. This slows down consensus, but could be a solution if no other approach.
I was thinking of each orderer node (OSN) acting as a proxy between the Fabric client (SDK) and the consensus layer. This means the OSN would combine both replica and client functionality of MinBFT. The Fabric client would pick one OSN and try to broadcast a transaction through it. Acting as a MinBFT client, the OSN would execute the client part of MinBFT protocol normally, which includes preparing and broadcasting a MinBFT request.
Maybe we should care about potential race between retry and commit to avoid committing a transaction twice in different blocks.
In case of Fabric, it should not be a big issue to deliver the same transaction twice. Each Fabric transaction has a unique ID and Fabric peers would simply ignore duplicate transactions. Even if the peer would not check the transaction ID, a duplicate transaction's read-write set would simply not apply to the current ledger state anymore, thus be ignored.
The number of "query" of MinBFT is about as half as that of other consensuses, which looks odd to me and I might miss something important. I'll dig this more...
This is surprising since queries should not involve ordering at all. I suppose you run benchmarking on a single machine. Maybe MinBFT ordering service consume more resources in background?
@Naoya-Horiguchi Sorry for the huge delay, but I finally managed to look into this. Let me summarize my feedback:
api.Configer
, api.Authenticator
, and api.RequestConsumer
.api.Configer
) on ledger, in the configuration block. Per-node configuration, such as enclave path etc. should be passed via orderer node's configuration (i.e. orderer.yaml
).support.WriteBlock
should be invoked by api.RequestConsumer
and not by MinBFT client.@Naoya-Horiguchi Sorry for the huge delay, but I finally managed to look into this. Let me summarize my feedback:
@sergefdrv, thank you for valuable comments.
- We need to create dedicated implementation for
api.Configer
,api.Authenticator
, andapi.RequestConsumer
.- We should keep common consensus parameters (provided by
api.Configer
) on ledger, in the configuration block. Per-node configuration, such as enclave path etc. should be passed via orderer node's configuration (i.e.orderer.yaml
).- We should use Fabric orderer node's key to make a normal client/replica signature.
- Maybe we can determine replica/client ID by comparing node's public key against a list of all OSN's public keys.
- Maybe batches should be cut by the primary,
I'm not sure about this a bit, in my understanding reply messages from MinBFT cluster are not sent back to the primary, but to the submitting orderers, so in order to get every block cut by the primary, we need collect all replies to the primary, which sounds inefficient to me. Is there any good reason to do that (for example requirement for committing phase?)
and MinBFT client part should submit individual transactions.
This also sounds notable to me, I thought that a MinBFT cluster in Hyperledger Fabric only does ordering, so it doesn't handle individual transactions. Could you elaborate this a little more please?
- To produce blocks consistently among OSNs,
support.WriteBlock
should be invoked byapi.RequestConsumer
and not by MinBFT client.- We will need to think about improving our build process.
Yes, actually setting up build part was painful when I tried previously. We had a progress on this area by #115, so I hope that the situation gets better now. I'll research what we could improve next.
Let me elaborate a bit on my understanding of integration with Fabric. In principle, there is nothing specific to MinBFT, but should be applicable to any BFT state machine replication protocol in general.
I distinguish several roles related to Fabric ordering service:
There are two flavors of implementing Fabric ordering service. One is to isolate SMR part into a separate cluster of nodes, e.g. Kafka, BFT-SMaRt. Another approach is to collocate OSN, SMR client, and SMR replica together in a single node, e.g. Raft. Since Raft-based ordering service is the most recent implementation and it is easier to set up and manage, it seems to be a good example to follow.
In any case, as described here, the main role of an OSN is to provide Fabric clients with broadcast
interface, and Fabric committing peers with deliver
interface. Mapped to state machine replication, broadcast
interface is related to SMR client functionality of submitting requests for ordering, whereas deliver
interface is related to the state machine replicated by SMR replicas.
So transaction flow would roughly look as follows:
broadcast
interface.support.WriteBlock
). The result of such "execution" could be empty.deliver
interface to ensure the consensus has been reached for those blocks.It is crucial that different OSNs deterministically append block to the blockchain with support.WriteBlock
. Otherwise Fabric peers will get confused and panic.
It is not clear to me who should combine individual transactions into batches. On one hand, it seems natural if SMR leader (MinBFT primary) would do that. On the other hand, each OSN can batch transactions itself and submit the whole batch as a single SMR request. Initially, I was in favor of the former way of request batching, but now I am really considering the latter one.
I was going to look into #121. I also wanted to try making enclave shim library path configurable so that we don't have to tweak LD_LIBRARY_PATH
.
There is an adapter for BFT-SMaRt: https://github.com/bft-smart/fabric-orderingservice
It would be nice if we have a similar adaptor for our project.