Ledger Enclave Strategy

hyperledger / fabric-private-chaincode

FPC enables Confidential Chaincode Execution for Hyperledger Fabric using Intel SGX.

Apache License 2.0

160 stars 91 forks source link

Ledger Enclave Strategy #402

Open mbrandenburger opened 4 years ago

mbrandenburger commented 4 years ago

The integration of the Ledger Enclave is one of the fundamental pillars in the FPC security architecture, however, the implementation is also challenging.

This issues continues the discussion based on the FPC RFC.

[ ] Define minimal set of functionality implemented by Ledger Enclave (and which other features we can safely ignore and which one we will have to detect but abort)
[ ] Develop short-/mid-/long-term working plan for development.
- [ ] define process on how to guarantee initial consistency and main invariant as fabric evolves (e.g., via code cross-referencing, potentially indirect via a HL fabric validation flow pseudo-code which abstract but also identifies location of fabric code) and potential some scripts which help identify crucial changes in validation logic when upgrading fabric versions)

yacovm commented 4 years ago

I think that instead of mimicking Fabric semantics in a complete manner, the strategy should be really deviating from Fabric by:

Restricting functionality of what you can do
Further isolating FPC transactions from Fabric ones

By doing (1), validation logic is reduced and most importantly, doesn't need to play catch-up with Fabric changes. In a sense, (2) is an indirect result of (1): Instead of validation all transactions of Fabric (and implement everything needed for that), maybe it's better to just make Fabric transactions and FPC transactions not be able to read each other's writes. The latter also means that in a manner of speaking, we are creating two "realms" inside the peer that prevents transactions to cross over. Surely, this would make several use cases infeasible, but my gut feeling is that 80% of the effort in support all Fabric validation logic would amount to catering to less than 20% of the use cases.

To achieve (1), there are several relaxations (which are, semantic restrictions) that can be taken into account:

Entirely remove the notion of "endorsement policy" by moving the trust to the endorsement step (since SGX is modeled as an honest party) and ensure that FPC instances only endorse transactions they are eligible of endorsing, and that it is enough to receive a single signature from an FPC instance.
For every namespace, restrict to a single autonomously monotonously increasing version number and at validation time always track the maximum version number and invalidate transactions that don't have this number. This can be done by adding into the read set of the transaction, the version number of the chaincode.

By doing the above, we are essentially decoupling the FPC transaction validation from the Fabric configuration block processing and the Fabric transaction lifecycle and greatly simplifying the code.

To realize (2) I think the best way would be to introduce to Fabric a new transaction type and have a custom transaction processor for that transaction. While it does modify Fabric, I believe it is the right approach in the long term because it doesn't require changing any of the existing Fabric endorsement transaction logic which is already complex, and also allows flexibility and more freedom to the FPC transaction validation logic, as it is no longer bound by the format and structure of the endorsement transaction.

The FPC would still appear to have a similar transaction flow to the user, only internally it'll be easier to implement and to reason about security.

g2flyer commented 4 years ago

2. Further isolating FPC transactions from Fabric ones

The challenge is, though, that current trusted ledger architecture, besides validation of fpc chaincode transaction, relies on proper validation of (a) standard fabric channel definition with, e.g., msp and orderer info, and any related updates and (b) validating one "normal" chaincode, the enclave registry (ercc). While with more invasive changes in fabric we probably can get away with (b), the reliance on (a) is pretty fundamental for the security. So simply bifurcating validation won't really work.

But as mentioned by both of you, we do not have to support the complete generality of validation (e.g., custom validation or endorsement plugins), i guess the main challenge is to cleanly and as narrowly as possible to define the (restricted) scope and then also have good processes in place to keep the code relation traceable and change-trackable (e.g., clear attribution on our case to where the go-equivalent is, maybe trying to also keep similar code structure/naming and maybe some scripts which can easily identify whether fabric changes will cause our code to correspondingly change?).

The mid/long-term strategy of course would be re-use common code but that would need (a) the dust settling on go support for sgx -- there are some early PoC based on graphene, but they are not yet in stable state yet that i would like to base it on ... -- and, potentially, (b) some modularization of the validation code in fabric.

yacovm commented 4 years ago

The challenge is, though, that current trusted ledger architecture, besides validation of fpc chaincode transaction, relies on proper validation of (a) standard fabric channel definition with, e.g., msp and orderer info, and any related updates and (b) validating one "normal" chaincode, the enclave registry (ercc). While with more invasive changes in fabric we probably can get away with (b), the reliance on (a) is pretty fundamental for the security. So simply bifurcating validation won't really work.

I think that you only need to implement a small subset of the logic for channel definition: The logic that parses the consensus definitions and can perform signature validation on block signatures. There is no need to do any further block validation processing, because we know that:

A config block is accepted by orderers if and only if it is signed by a quorum of orderers in BFT (or a single one in CFT)
A config block is accepted by peers if and only if the peers remain functioning afterwards (this is the current Fabric behavior: If the peer failed to process a config block update, it doesn't commit the block and halts the channel processing or panics)

Therefore, as long as a block is signed by the required amount of orderers, and the peers remain functioning after ingesting the block, then the config block is valid.

If the Ledger Enclave resides in an even lower level, as long as it sees that signatures of a config block are correct, then this block is the only one with its sequence, and it can accept it without problems.

g2flyer commented 4 years ago

A config block is accepted by orderers if and only if it is signed by a quorum of orderers in BFT (or a single one in CFT)

A config block is accepted by peers if and only if the peers remain functioning afterwards (this is the current Fabric behavior: If the peer failed to process a config block update, it doesn't commit the block and halts the channel processing or panics)

Hmm, i don't think you can rely on this. If the trusted ledger doesn't validate the config-changes but just takes them at face value, a bad peer could create one with a new (insecure) orderer configuration. Yes, good peers might panic but that doesn't stop the bad peer to give it to the trusted_enclave and then feeds it with bogus ordering messages which can then lead to revelation to supposedly secret state?

PS: After some discussion with Bruno: Did you mean that orderer actually validate the (complete?) content of the config messages themselves (i.e., signed by enough orgs according to the corresponding lifecycle policy and alike) and forward an ordered config message only if it is valid? If so, then i could see that we don't have to repeat validation as we have to unconditionally trust orderers anyway. But then a (properly ordered/signed) config block would always be accepted by peers? At least unless the orderers are corrupted but in that case all bets are off anyway?

yacovm commented 4 years ago

Hmm, i don't think you can rely on this. If the trusted ledger doesn't validate the config-changes but just takes them at face value, a bad peer could create one with a new (insecure) orderer configuration. Yes, good peers might panic but that doesn't stop the bad peer to give it to the trusted_enclave and then feeds it with bogus ordering messages which can then lead to revelation to supposedly secret state?

Recall that I said:

I think that you only need to implement a small subset of the logic for channel definition: The logic that parses the consensus definitions and can perform signature validation on block signatures.

This means that the peer cannot create a config block which the orderer didn't validate beforehand. If "good peers" panic, then it is irrelevant what the "bad peer" that surrounds your enclave is doing, since all honest players in the system are now out of the game and you have bigger problems to face.

PS: After some discussion with Bruno: Did you mean that orderer actually validate the (complete?) content of the config messages themselves (i.e., signed by enough orgs according to the corresponding lifecycle policy and alike) and forward an ordered config message only if it is valid?

The orderer has always been validating configuration blocks. More specifically, it takes the config update, and tries to "simulate" a config update by "proposing it" to the current config. If what comes out of the simulation is the same config, then the config update is valid. Otherwise, there is a different config that comes out, and it means that the config update needs to be re-submitted (essentially, it's a MVCC check for configuration transactions!)

If so, then i could see that we don't have to repeat validation as we have to unconditionally trust orderers anyway. But then a (properly ordered/signed) config block would always be accepted by peers? At least unless the orderers are corrupted but in that case all bets are off anyway?

An orderer validates everything in the configuration, besides a single thing which is the capability level of the peers. However, a capability level is a binary "yes or no" and if the peer doesn't have the capability it will panic, therefore it is possible that the peers will panic due to a config update that the orderer signed, but nothing more. But, this is by design, because the orderer can be in a high compatibility version than peers but it doesn't know the compatibility version of the peers.

g2flyer commented 4 years ago

Thanks for the clarification. In this case as mentioned in my PS i agree with you that verifying orderer signature for config-update messages should be enough. Given that we need MSP-rooted signature verification, (subsets of) policy evaluation and the parsing of the protobufs elsewhere (and hence corresponding library functions), it's less clear how much complexity we can save in this context?

That said, it is certainly key to first identifying what we really need and where we can subset -- e.g., for MSP we clearly need only X509, only new lifecycle, no custom validator, endorser or decorator plugins. Also important to distinguish between what ever falls out of the subset we can silently ignore -- e.g., for chaincode other than fpc chaincodes and the special ercc "normal" chaincode we can ignore any use of custom plugins or different MSPs -- and what we will have to abort of encountering -- e.g., any unexpected MSP in channel or lifecycle policies as applicable to "our" chaincodes will have to lead to an abort.

Probably best might be to first describe in high-level pseudo-code the validation flows fabric does (maybe with some references where in the code the corresponding logic is) and then annotate what we have to do and how we handle it? That could bring us on a safe ground and by keeping it as a living document (and also cross-referencing our code to this) hopefully also will help keeping the security invariants enforced as times go by? BTW: is there already somewhere a fabric doc with the validation logic including all the key validation steps but being high-level enough to manageable enough to maintain for our purposes?

yacovm commented 4 years ago

Given that we need MSP-rooted signature verification, (subsets of) policy evaluation and the parsing of the protobufs elsewhere (and hence corresponding library functions), it's less clear how much complexity we can save in this context?

You don't want to implement the Fabric configuration parsing at its whole. It's a deep rabbit hole and you might hurt the earth's inner core. Also, the policy engine is very flexible and you don't need to implement all of it. You just need to implement verification for a subset of use cases and support some reasonable policy, not every possible policy, and the Fabric deployment use case will just need to adjust and configure the policy that is supported.

That said, it is certainly key to first identifying what we really need and where we can subset -- e.g., for MSP we clearly need only X509, only new lifecycle, no custom validator, endorser or decorator plugins.

Why do you need a lifecycle at all? Assuming you never support FPC reading non FPC and vice versa (which I think is the way to go) and only support a simple naive endorsement policy of "one-of-any-enclave" then do you need a lifecycle at all? Endorsement policies are there because you cannot assume honest execution, but, with SGX you can.

BTW: is there already somewhere a fabric doc with the validation logic including all the key validation steps but being high-level enough to manageable enough to maintain for our purposes?

No, there are lots of corner cases and gotchas, especially with state-based endorsement... I suggest you not implement anything other than the bare minimum, as I think the "less is more" applies here.

g2flyer commented 4 years ago

You don't want to implement the Fabric configuration parsing at its whole. It's a deep rabbit hole and you might hurt the earth's inner core.

Ultimately, we have to be able to reconstruct and maintain the channel definition with the MSP configs for orderer and orgs in channel plus related policies. I've definitely noticed that protobuf parsing is non trivial given, e.g., the nesting of sub-protobufs via binary blobs and alike. Certainly seems protobuf as technology has room for improvement :-)

Also, the policy engine is very flexible and you don't need to implement all of it. You just need to implement verification for a subset of use cases and support some reasonable policy, not every possible policy, and the Fabric deployment use case will just need to adjust and configure the policy that is supported.

Oh, of course we don't want to support all possible policies; that's why i mentioned the subset of (policies). E.g., for channel policies and lifecycle policies i guess one could reasonably restrict to majority (which if i'm not mistaken is now also the default for these policies)? For our ercc chaincode, we also need the same policy as lifecycle, so this should also be covered, and for fpc we anyway have our "custom subset" (initially a single org as an OR term)

That said, it is certainly key to first identifying what we really need and where we can subset -- e.g., for MSP we clearly need only X509, only new lifecycle, no custom validator, endorser or decorator plugins.

Why do you need a lifecycle at all? Assuming you never support FPC reading non FPC and vice versa (which I think is the way to go) and only support a simple naive endorsement policy of "one-of-any-enclave" then do you need a lifecycle at all? Endorsement policies are there because you cannot assume honest execution, but, with SGX you can.

This is not completely true: on the one hand, we need the lifecycle for ercc which is standard chaincode. On the other hand, we also need it for FPC chaincode to get the initial explicit agreement from all orgs on a particular chaincode (essentially the same reason fabric needs it also to bootstrap the root-of-trust and getting a common agreement on it)

BTW: is there already somewhere a fabric doc with the validation logic including all the key validation steps but being high-level enough to manageable enough to maintain for our purposes?

No, there are lots of corner cases and gotchas, especially with state-based endorsement... I suggest you not implement anything other than the bare minimum, as I think the "less is more" applies here.

Oh, state-based endorsement we already explicitly ruled out as unsupported, guessing it raising lots of issues which we better don't tackle at this point in time :-)

bvavala commented 4 years ago

You don't want to implement the Fabric configuration parsing at its whole.

Right. For FPC, retrieving the crypto material of the organizations should be sufficient.

You just need to implement verification for a subset of use cases and support some reasonable policy, not every possible policy

We are definitely on that path with the current version, which requires "one enclave at a designated peer" #273 . This gives us the chance to begin the discussion on the definition of "reasonable". As it is difficult (or not possible) to define the "designated peer" in Fabric, this will likely have to evolve to "one enclave in an org" (#274 ). In addition, for risk management (more below), some use cases may require more than 1 endorsing enclave (say 2 or 3; #275 covers this.) at different orgs. It would be nice to know your (@yacovm ) perspective on these 3 options.

Why do you need a lifecycle at all? Assuming you never support FPC reading non FPC and vice versa (which I think is the way to go) and only support a simple naive endorsement policy of "one-of-any-enclave" then do you need a lifecycle at all? Endorsement policies are there because you cannot assume honest execution, but, with SGX you can.

The "one-of-any-enclave" policy is reasonable for some (but not all) use cases. There are two risks.

loss of state availability. If only a single enclave has the chaincode/state keys, its unavailability implies that nobody can make progress nor recover any data. Having multiple endorsing enclaves can address this issue.
compromised computation/state integrity. As it is possible (for example) that an FPC chaincode will have vunerabilities, if only a single enclave endorses the results, then the verifier will have no means to detect any issue. Using two endorsing enclaves and two different organizations can reasonably raise the bar for an adversary.

As a side note, these issues came up also in the context of the Private Data Objects project. The first is addressed through a set of provisioning services for contract state keys, and a set of storage services for the state. The second is addressed by (optionally) letting users re-run the computation on other enclaves for verification.

g2flyer commented 3 years ago

With FPC Lite / FPC 1.0 this issue is tabled for now until we come back to future extensions including rollback protection ....