hyperledger-labs / minbft

Implementation of MinBFT consensus protocol.
Apache License 2.0
63 stars 25 forks source link

Develop Adapter to Use MinBFT as Ordering Service of Hyperledger Fabric #81

Open ynamiki opened 5 years ago

ynamiki commented 5 years ago

There is an adapter for BFT-SMaRt: https://github.com/bft-smart/fabric-orderingservice

It would be nice if we have a similar adaptor for our project.

sergefdrv commented 5 years ago

That would be good. It must not necessarily follow the architecture of BFT-SMaRt adapter because we use language compatible with Fabric code base. We should compare it with how Raft is integrated in Fabric.

Additionally, we might be interested in integration with HL Sawtooth.

nhoriguchi commented 5 years ago

I wrote some adapter code and a gist document to address this issue. So let me share.

I did the followings:

Unfortunately, I didn't have any appealing number yet (see the figure in the gist), so my next step is to investigate more about bottleneck and look for ideas about performance optimization.

I'm glad if I can have any feedback or advise.

(edited) The number of "query" of MinBFT is about as half as that of other consensuses, which looks odd to me and I might miss something important. I'll dig this more...

ynamiki commented 5 years ago

@Naoya-Horiguchi Great work! I think your performance test was done in the simulation mode. I will try it with a SGX-equipped machine.

ynamiki commented 5 years ago

Unfortunately, I give up.... I find that it is so much complicated task in my environment. The SGX-equipped machine is behind a corporate proxy with SSL inspection and I have to update all Dockerfiles to install a special certificate for SSL Inspection:anguished:.

sergefdrv commented 5 years ago

The SGX-equipped machine is behind a corporate proxy

It might be easier to develop the integration building the enclave in simulation mode. Then there's no need in hardware SGX machines. Conceptually, integration should not depend on whether you use simulation of hardware SGX.

sergefdrv commented 5 years ago

Sorry, I haven't found time to look at the code yet, but it's on my to-do list :wink:

nhoriguchi commented 5 years ago

@ynamiki, @sergefdrv thank you for comments. I admit that the code I'm developing is immature and need to be improved before reviewed in detail. Namiki-san's work on #115 for delivery improvement would help me very much.

The SGX-equipped machine is behind a corporate proxy with SSL inspection and I have to update all Dockerfiles to install a special certificate for SSL Inspectionanguished.

Yes, proxy often prevents us from easy testing, so if available you can use SGX-equipped VMs on Azure. But anyway, I think it's nice if I can provide the procedure properly working on any environment/SGX-mode.

BTW, I'm facing a fundamental issue when applying minbft to blockchain platform, so let me share about it. In short words, how do we guarantee Byzantine tolerance from the whole system's perspective? Even if the consensus component our project provides is proven to be Byzantine tolerant, that seems not directly mean the whole system is Byzantine tolerant.

For example in Hyperledger Fabric, a orderer creates a block candidate with submitted transactions and tries to make an agreement on it between other orderers. The orderer distributes the block candidate to all other orderers (as discussed in #119), then consensus process starts. However, what our consensus code actually does is just to check and agree on what the primary orderer suggests. So what if the primary orderer is Byzantine faulty and controls the contents of the block candidate on purpose? For example, the Byzantine faulty primary orderer could intentionally exclude some transactions from a specific user from inclusion to any block candidates, then the user can't properly use the system.

I have some ideas to handle this issue:

  1. Sharing all submitted transactions among orderers and checking each other that the contents of submitted block candidates are fair enough (i.e. no transaction is infinitely delayed). This sounds straightforward to me and Bitcoin should do similar. And it's SGX-free approach, so we could expect that someone in Fabric community might working on it. OTOH, it introduces large overhead on consensus (maybe needs additional communication rounds). And we simply need many lines of code.
  2. Using SGX enclave to create block candidates. I think that this might be a good usecase of SGX, but some hardware limitation could be a problem so we need research more on its feasibility.
  3. Introducing Byzantine Fault Detection mechanism in minbft cluster. Previously I mentioned some about it and if we can confirm orderer's correctness in some way, it would be helpful to mitigate the issue. I'm still not sure of feasibility/effectiveness of this approach, but just let me share.

I currently feel that 2. is promising, but if you have any comment and/or any idea of alternative approach, that would be nice.

sergefdrv commented 5 years ago

@Naoya-Horiguchi I still couldn't find time to look at the code, but let me just give some feedback to your concern.

BTW, I'm facing a fundamental issue when applying minbft to blockchain platform, so let me share about it. In short words, how do we guarantee Byzantine tolerance from the whole system's perspective? Even if the consensus component our project provides is proven to be Byzantine tolerant, that seems not directly mean the whole system is Byzantine tolerant.

In case of Hyperledger Fabric, the consensus mechanism of the ordering service is in general complemented with the chaincode endorsement policy to achieve a desired level of resilience.

For example in Hyperledger Fabric, a orderer creates a block candidate with submitted transactions and tries to make an agreement on it between other orderers. The orderer distributes the block candidate to all other orderers (as discussed in #119), then consensus process starts. However, what our consensus code actually does is just to check and agree on what the primary orderer suggests. So what if the primary orderer is Byzantine faulty and controls the contents of the block candidate on purpose? For example, the Byzantine faulty primary orderer could intentionally exclude some transactions from a specific user from inclusion to any block candidates, then the user can't properly use the system.

I think the consensus layer should be aware of individual transactions so that it can detect potential unfair treatment (censorship) from the primary. The consensus layer in the primary replica can take advantage of Fabric's functionality to form transaction batches (BlockCutter interface). We will need to extend our MinBFT implementation and its interfaces to support this.

Furthermore, I think we will need to extend MinBFT so that it can provide a kind of consensus proof to RequestConsumer interface. We can attach this proof as metadata to the delivered blocks. This consensus proof then should be checked by Fabric peers.

From the Fabric SDK side, the client, after broadcasting a request through an orderer node, should make sure it actually appears in the ledger. After some timeout, it can suspect the orderer node is faulty and re-broadcast the transaction via another node.

nhoriguchi commented 5 years ago

Thanks for the feedback, @sergefdrv.

In case of Hyperledger Fabric, the consensus mechanism of the ordering service is in general complemented with the chaincode endorsement policy to achieve a desired level of resilience.

Yes, endorsement part focuses on generating approvals that a suggested transaction is considered valid by checking the result of the execution from a right set of endorsing peers. This part handles each transaction independently. Then ordering service focuses on ordering the endorsed transactions.

The point is that each orderer receives different set of transactions, so if we want consensus layer to handle individual transactions (or to check that other orderers handle transactions in fair manner), we need share the information of all submitted transactions among all orderers. This slows down consensus, but could be a solution if no other approach.

From the Fabric SDK side, the client, after broadcasting a request through an orderer node, should make sure it actually appears in the ledger. After some timeout, it can suspect the orderer node is faulty and re-broadcast the transaction via another node.

Retrying with different orderer is fine to me. Maybe we should care about potential race between retry and commit to avoid committing a transaction twice in different blocks.

sergefdrv commented 5 years ago

The point is that each orderer receives different set of transactions, so if we want consensus layer to handle individual transactions (or to check that other orderers handle transactions in fair manner), we need share the information of all submitted transactions among all orderers. This slows down consensus, but could be a solution if no other approach.

I was thinking of each orderer node (OSN) acting as a proxy between the Fabric client (SDK) and the consensus layer. This means the OSN would combine both replica and client functionality of MinBFT. The Fabric client would pick one OSN and try to broadcast a transaction through it. Acting as a MinBFT client, the OSN would execute the client part of MinBFT protocol normally, which includes preparing and broadcasting a MinBFT request.

Maybe we should care about potential race between retry and commit to avoid committing a transaction twice in different blocks.

In case of Fabric, it should not be a big issue to deliver the same transaction twice. Each Fabric transaction has a unique ID and Fabric peers would simply ignore duplicate transactions. Even if the peer would not check the transaction ID, a duplicate transaction's read-write set would simply not apply to the current ledger state anymore, thus be ignored.

sergefdrv commented 4 years ago

The number of "query" of MinBFT is about as half as that of other consensuses, which looks odd to me and I might miss something important. I'll dig this more...

This is surprising since queries should not involve ordering at all. I suppose you run benchmarking on a single machine. Maybe MinBFT ordering service consume more resources in background?

sergefdrv commented 4 years ago

@Naoya-Horiguchi Sorry for the huge delay, but I finally managed to look into this. Let me summarize my feedback:

  1. We need to create dedicated implementation for api.Configer, api.Authenticator, and api.RequestConsumer.
  2. We should keep common consensus parameters (provided by api.Configer) on ledger, in the configuration block. Per-node configuration, such as enclave path etc. should be passed via orderer node's configuration (i.e. orderer.yaml).
  3. We should use Fabric orderer node's key to make a normal client/replica signature.
  4. Maybe we can determine replica/client ID by comparing node's public key against a list of all OSN's public keys.
  5. Maybe batches should be cut by the primary, and MinBFT client part should submit individual transactions.
  6. To produce blocks consistently among OSNs, support.WriteBlock should be invoked by api.RequestConsumer and not by MinBFT client.
  7. We will need to think about improving our build process.
nhoriguchi commented 4 years ago

@Naoya-Horiguchi Sorry for the huge delay, but I finally managed to look into this. Let me summarize my feedback:

@sergefdrv, thank you for valuable comments.

  1. We need to create dedicated implementation for api.Configer, api.Authenticator, and api.RequestConsumer.
  2. We should keep common consensus parameters (provided by api.Configer) on ledger, in the configuration block. Per-node configuration, such as enclave path etc. should be passed via orderer node's configuration (i.e. orderer.yaml).
  3. We should use Fabric orderer node's key to make a normal client/replica signature.
  4. Maybe we can determine replica/client ID by comparing node's public key against a list of all OSN's public keys.
  5. Maybe batches should be cut by the primary,

I'm not sure about this a bit, in my understanding reply messages from MinBFT cluster are not sent back to the primary, but to the submitting orderers, so in order to get every block cut by the primary, we need collect all replies to the primary, which sounds inefficient to me. Is there any good reason to do that (for example requirement for committing phase?)

and MinBFT client part should submit individual transactions.

This also sounds notable to me, I thought that a MinBFT cluster in Hyperledger Fabric only does ordering, so it doesn't handle individual transactions. Could you elaborate this a little more please?

  1. To produce blocks consistently among OSNs, support.WriteBlock should be invoked by api.RequestConsumer and not by MinBFT client.
  2. We will need to think about improving our build process.

Yes, actually setting up build part was painful when I tried previously. We had a progress on this area by #115, so I hope that the situation gets better now. I'll research what we could improve next.

sergefdrv commented 4 years ago

Let me elaborate a bit on my understanding of integration with Fabric. In principle, there is nothing specific to MinBFT, but should be applicable to any BFT state machine replication protocol in general.

I distinguish several roles related to Fabric ordering service:

There are two flavors of implementing Fabric ordering service. One is to isolate SMR part into a separate cluster of nodes, e.g. Kafka, BFT-SMaRt. Another approach is to collocate OSN, SMR client, and SMR replica together in a single node, e.g. Raft. Since Raft-based ordering service is the most recent implementation and it is easier to set up and manage, it seems to be a good example to follow.

In any case, as described here, the main role of an OSN is to provide Fabric clients with broadcast interface, and Fabric committing peers with deliver interface. Mapped to state machine replication, broadcast interface is related to SMR client functionality of submitting requests for ordering, whereas deliver interface is related to the state machine replicated by SMR replicas.

So transaction flow would roughly look as follows:

  1. Fabric client assembles a transaction and submits it to the ordering service through one of the OSNs using its broadcast interface.
  2. The OSN act as a SMR client and submits the transaction to SMR replicas as a request to order and "execute".
  3. SMR leader (MinBFT primary) proposes the request to other SMR replicas for "execution".
  4. SMR replicas run consensus protocol on that proposal and eventually accept the request for "execution".
  5. SMR replicas invoke the replicated state machine to "execute" the request by appending it as a new block to the blockchain (using support.WriteBlock). The result of such "execution" could be empty.
  6. Once the new block is appended to the blockchain, OSNs will deliver it to all subscribed Fabric nodes such as committing peers and clients.
  7. In case of BFT ordering, the Fabric nodes should do consensus-specific verification of blocks received through deliver interface to ensure the consensus has been reached for those blocks.
  8. In case of BFT ordering, the Fabric client needs to wait until the transaction appears in a new block, and retry through another OSN if it takes too long.

It is crucial that different OSNs deterministically append block to the blockchain with support.WriteBlock. Otherwise Fabric peers will get confused and panic.

It is not clear to me who should combine individual transactions into batches. On one hand, it seems natural if SMR leader (MinBFT primary) would do that. On the other hand, each OSN can batch transactions itself and submit the whole batch as a single SMR request. Initially, I was in favor of the former way of request batching, but now I am really considering the latter one.

sergefdrv commented 4 years ago

I was going to look into #121. I also wanted to try making enclave shim library path configurable so that we don't have to tweak LD_LIBRARY_PATH.