[Investigate] Use confidential computing to secure funds and ensure order privacy

boolafish commented 5 years ago

Currently, Google cloud and Azure provides service for this.

Looks like only the computing hardware (trusted execution environments (TEEs)) would know the data, even hardware administrator (cloud service provider) would not know anything. This seems like to be good enough for RC operator. Ideally operator can provide the validation code and venue provide the order and settlement transaction data. Computation is done inside the confidential computing service and operator would only get the result validation pass or not.

ref: #76

boolafish commented 5 years ago

[Note] Pepesza come up with an even higher level usage of possibly to use confidential computing as imapp has investigated more on this previously.

We can (possibly) leverage the technology not only for privacy but also for computation integrity. Instead of only letting operator verifying using this, venue runs the exchange under this from the beginning.

If traders can verify: The "confidential computing" is running a code that they trust or audited, they can just trust the output of such computation. Which is, a venue can possibly do everything inside (create tx, signing tx using some keys inside the confidential computing) and traders only need to check the output transaction signature is from some private key only inside the hardware.

So the questions for this design comes to:

Can user actually check/verify the code running inside of it?
Can we have some priv key only possible inside that hardware?

@paulperegud please add anything I missed.

paulperegud commented 5 years ago

This solution would place us in no-custody land, making exchange funds as secure as Intel SGX allows (see Problems section below).

Comparing SGX and zkSNARKs for proofs we get following. SGX strong sides: 1) cheap developmentsimple 2) cheap deployment 3) continuous trading is possible 4) funds staying on the venue is simple 5) mixing of funds staying on the venue is freesimple 6) front-running prevention is simple SGX weak sides: 1) So far it was a subject of side-channel data ex-filtration attacks. This possibly can mean leaking of the Ethereum private key. 2) Program running inside the enclave is a subject to remote code execution (RCE) attacks. 3) There is little to no trust tech as whole and in the community in SGX as a platform.

zkSNARKs strong sides: 1) You trust math. No hardware vulnerabilities or RCE are possible. 2) In high regard by the community.

zkSNARKs weak sides: 1) External dependency on Ethereum project solving trusted setup issue. 2) Since funds need to wait to settle before they can trade again, capital locked-in is less effective. 3) It's not obvious if continuous trading is an option in this model. 4) Front-running prevention is harder to do.

Possible flow.

Venue installs and certificates the enclave.
User connects to enclave via https channel.simple
User downloads enclave certificate and checks if https connection certificate is signed using the enclave key.
User learns what Ethereum key enclave uses. This key was generated inside the enclave and SGX is there to protect the key from being stolen.
User funds the venue and places the order via https channel.
Enclave runs matching algorithm and creates a settlement tx.
Settlement tx is signed using the key stored.

What needs to be inside the enclave to ensure funds security?

Order matching algorithm.
Settlement tx construction and signing.

What needs to be inside the enclave to prevent possible front-running?

HTTPS server for placing orders that traders can talk to.

Problems

SGX is full of holes. Intel is working on fixing them, so we are not yet out of luck. I'll talk to @mkow who is keeping tabs on situation in SGX land, he will know how bad/good is it. So far every version of Intel's SGX was plagued with bugs that allowed ex-filtration of data from inside of the enclave. I'm not really worried about ex-filtration of private orders - that's probably a very ineffective way of doing frontrunning. Much more serious would be an attack against Ethereum key generated and stored inside the enclave. If ex-filtrated via side-channel (e.g. Foreshadow attack), all funds resting on exchange would be stolen.

boolafish commented 5 years ago

Thanks! I think next step is basically we need to collect data/info on how safe this is. Also, whether the flow above is possible or not. If not, what are the alternatives?

If it is safe we can go no-custody land directly :) . If it is un-sure, it seems to me that we can at least have easy RC setup with good-enough privacy which is still a good win no matter which one ends up be the possible one.

madxor commented 5 years ago

Can user actually check/verify the code running inside of it?

There is something called attestation which proves:

The enclave is running in a valid Trusted Execution Environment (TEE), which is Intel SGX in this case (trustworthiness).
The enclave has the correct identity and runtime properties that has not been tampered with (identity).

The enclave identity provides information that should be sufficient to verify the enclave. Users will be able to verify the running enclave only through it's identity. So we should make our own enclave as transparent as possible. Right now I don't know how the enclave building process looks like and if it will be "non-deterministic" it might break the trust bound between the enclave code and the enclave image. By "non-deterministic" I mean that it is hard to achieve the same enclave "identifier/footprint/checksum" for the same code for two different enclave builds. This will limit the "trust" in the running enclave cause users need to trust in the provided enclave build without locally confirming that the code is building that particular enclave.

madxor commented 5 years ago

Can we have some priv key only possible inside that hardware?

There is a prototype enabling accessing TPM from openenclave. So theoretically we should be able to store private keys inside of TPM on a selected machine. The downside of this is that we will not be able to use full potential of the cloud. Cause to my knowledge there is no mechanism that would easily transfer enclave/vm between two physical devices with TPM to TPM data sync.

The move towards TEE/SGX/TPM is pushing us into an intel-amazon-microsoft silo. That move needs to be well backed from both business and engineering perspective.

madxor commented 5 years ago

If we want to use confidential computing as a temporary solution to shorten the time to market and deliver quickly without compromising security/privacy then it should be a viable solution - if we provide users with proper auditability of the enclave code.

For the long term perspective it might work, but we should have in mind the following (business risks):

hardware bugs are harder to fix and more expensive
adding more layers to the solution increases complexity and excessive complexity is a security risk
by using confidential computing we will start depend heavily on Intel, Azure and Microsoft (and vmware), we will inherit all bugs from them without any possibility for a code audit
all the above might scare of potential users (at least the most paranoid ones)

paulperegud commented 5 years ago

Can we have some priv key only possible inside that hardware?

There is a prototype enabling accessing TPM from openenclave.

I'm more interested in generating key pair in SGX enclave and if such key would be secure in the face of spectres and meltdowns of future. Are there other enclaves capable of running matching algorithm and https-enabled web server inside?

Users will be able to verify the running enclave only through it's identity

Is it realistic to check enclave identity in browser / dapp?

temporary solution to shorten the time to market

Yes, this is the goal. Also, this is an exploration of possibilities for external parties. We will not be implementing this thing ourselves.

boolafish commented 5 years ago

[Ask for discussion] Since SGX is more about a verification proof instead of fraud proof, if it is trust worthy for a DEX to run, do we really need that DEX to be on top of a Plasma ?? It feels to me we can just skip the Plasma operator and let SGX be the operator that submits its own DEX data to ethereum.

For user to exit, in happy case, traders just submit request to SGX and could immediately get their fund back in Ethereum (ref: #96). If there is hardware failure on SGX (assuming this is the only possible failure), traders can start exit game on ethereum which would take longer period. (Potentially we can decrease this to a day? Assuming there is no possibility for SGX to submit double spending tx, we don't need priority and mass exit so we don't have the concern of the exit period would impact the system capacity of #UTXOs)

Meanwhile, data availability might be able to be done by SGX submit encrypted data to IPFS?

@paulperegud @madxor how do you think?

paulperegud commented 5 years ago

Yeah, that is true. Thing can work on top of Ethereum and there are only two failure modes: SGX hacked or venue going offline. We can't do much about first one. Second one requires an exit game - which is doable only if contract has enough data to determine who owns what.

Meanwhile, data availability might be able to be done by SGX submit encrypted data to IPFS?

IPFS is not a needed component here. Just send encrypted data directly to traders. However, there is a problem with decryption key. You can use "time capsule" approach for as a trust-less variant or a secret sharing scheme with shares hold by advisors / public figures.

madxor commented 5 years ago

Can we have some priv key only possible inside that hardware?

There is a prototype enabling accessing TPM from openenclave.

I'm more interested in generating key pair in SGX enclave and if such key would be secure in the face of spectres and meltdowns of future. Are there other enclaves capable of running matching algorithm and https-enabled web server inside?

It is possible to generate key pairs in SGX but looking at current state of enclave, SGX and it's vulnerabilities, there is no possibility to protect key-pairs from being recovered. The privacy feature of SGX is completely broken but the integrity feature still stands.

In general, you can build an enclave on your own, that will do what you want so it should be possible to run matching algorithm and have https-enabled web server inside of it.

One thing to note here is that all SGX vulnerabilities I've familiarized myself with are only possible when rogue actor is capable to run his own code on the same physical machine as the victims' enclave. So the probability of successful attack on an enclave is equal to the probability that the attackers code will run on the same physical machine. Calculating that is not easy cause we need to add the probability of gaining access to the same machine through exploitation techniques or other hacking means.

Users will be able to verify the running enclave only through it's identity

Is it realistic to check enclave identity in browser / dapp?

Yes, it should be. But the attestation is a bit tricky cause it requires to be done through Intel Attestation Services. Maybe there is a way to do it bypassing Intel but have not found any credible source describing this.

madxor commented 5 years ago

[Ask for discussion] Since SGX is more about a verification proof instead of fraud proof, if it is trust worthy for a DEX to run, do we really need that DEX to be on top of a Plasma ?? It feels to me we can just skip the Plasma operator and let SGX be the operator that submits its own DEX data to ethereum.

For user to exit, in happy case, traders just submit request to SGX and could immediately get their fund back in Ethereum (ref: #96). If there is hardware failure on SGX (assuming this is the only possible failure), traders can start exit game on ethereum which would take longer period. (Potentially we can decrease this to a day? Assuming there is no possibility for SGX to submit double spending tx, we don't need priority and mass exit so we don't have the concern of the exit period would impact the system capacity of #UTXOs)

Meanwhile, data availability might be able to be done by SGX submit encrypted data to IPFS?

@paulperegud @madxor how do you think?

I think that it is a viable solution. In case of a breakage that results in a disclosure of private keys the attacker should not gain anything cause due to the proof of authority when he would try to add some transaction to the system, watchers would detect that and start mass exit. And an attacker (like an operator) will exit last. So the attacker can only DoS the system (blackmail operator) but shouldn't be able to get the money out.

madxor commented 5 years ago

Meanwhile, data availability might be able to be done by SGX submit encrypted data to IPFS?

IPFS is not a needed component here. Just send encrypted data directly to traders. However, there is a problem with decryption key. You can use "time capsule" approach for as a trust-less variant or a secret sharing scheme with shares hold by advisors / public figures.

@paulperegud, that's interesting! Could you elaborate a bit more about the "time capsule" approach?

boolafish commented 5 years ago

[Note] From the call with @madxor, for the privacy it does not hold because the cache/CPU process is not encrypted. So my understanding is that a ~~hardware level access~~ can potentially breach the privacy promise.

[edit: see the following comment : ) ]

madxor commented 5 years ago

@boolafish, attackers VM collocation on the same machine is the threat we should be afraid of.

paulperegud commented 5 years ago

@madxor our attacker is the insider - he has the access to hardware and can run on the level of the supervisor. He is more powerful than attacker running on colocated VM.

@boolafish @madxor I've talked to mkow. TLDR: with newest generation (Cascade Lake) most of leaks have been patched.

Status of patches

All of the known bugs (Meltdown, Spectre, Forshadow) with exception for two are patched in microcode, hardware or can be mitigated by compiler flags when compiling the enclave. <- this claim needs careful evaluation, @madxor

Please note - we do not care about performance, only about security here. Every trick, even costly one needs to be deployed. Running in specialized cloud on dedicated hardware is a bonus, especially if we can prove that to the user. The last bit might be really really hard because SGX by design does not expose any information that can be used to prove such claim. We might be able to do that using networking tricks (proof of distance to owner of the key).

More than that - user will know what she is working with, making our claims verifiable. 1) Enclave certificate will show exact version of microcode. 2) Enclave certificate will show exact version of the processor. 3) Enclave certificate will show if hyperthreading is on or off. Should be off. See here - Developers of software running in an enclave section.

Estimating time required to extract the key

Side-channel attacks like this usually leak information byte by byte. Timing window during which data can be retrieved is usually really small (making those attacks really hard to perform if attacker is running JS code in the browser, because of limited precision of timer). But in our model attacker has access to the most precise timer available on the machine. Even in such situation, key recovery takes some time - where time is a function of number of attack windows, and those are a function of number of signatures enclave produces. There are PoCs on the net for such attacks. We can estimate how many times enclave can sign before the key needs to be rotated.

Key rotation

Normally, key rotation would involve the need to move funds to new address. To avoid this, use a predicate with it's own state on Ethereum. The state will contain the "owner" field - where new rotated keys will be placed. Venue will issue Ethereum transaction from old address to the contract, setting new address. All the funds will be managed by the predicate contract. To satisfy unlocking criterion, signature of current "owner" needs to be produced. @boolafish, what do you think about this?

As for rotation itself - just hash private key with some known salt to produce new key. Due to avalanche effect (design requirement for cryptographic hashes) attacker would need to explore 2^(256-k) possibilities to learn the new value. Where k is the amount of bits the old private key attacker has recovered until that moment - value we should be able to estimate.

That's the high level approach to the key rotation.

@pik694 Have I forgotten about anything?

madxor commented 5 years ago

To have a more complete overview of the TEE (Trusted Execution Environment) I've spent a bit more time diving into other solutions on the market. In addition to SGX from Intel there are two noteworthy competitors AMD with it's SEV (Secure Encrypted Virtualization) and ARM with it's TrustZone.

Unfortunately, both have weaknesses, for SEV look here and here, for TrustZone look here and here.

If we don't want to build our own side-channel resistant, hardware solution based for example on ARM Crypto series, then I suggest keeping an eye on this market but refrain from using it.

paulperegud commented 5 years ago

If we don't want to build our own side-channel resistant, hardware solution based for example on ARM Crypto series, then I suggest keeping an eye on this market but refrain from using it.

Noted. Unfortunately, until we fix #100 we need SGX or similar tech.

May I ask you to look into following claim?

All of the known bugs (Meltdown, Spectre, Forshadow) with exception for two are patched in microcode, hardware or can be mitigated by compiler flags when compiling the enclave.

I'm interested in details and links. E.g. for the unfixed stuff I want to know if researches have provided PoC code and / or some numbers on performance of their data exfiltration attack.

madxor commented 5 years ago

All of the known bugs (Meltdown, Spectre, Forshadow) with exception for two are patched in microcode, hardware or can be mitigated by compiler flags when compiling the enclave. <- this claim needs careful evaluation

@paulperegud, I don't know how reliable is this source but according to it the following vulnerabilities were patched:

Hardware mitigations for CVE-2017-5715 (Spectre, Variant 2).
Hardware mitigations for CVE-2017-5754 (Meltdown, Variant 3).
Hardware mitigations for CVE-2018-3640 (Rogue System Register Read (RSRE), Variant 3a).
Hardware mitigations for CVE-2018-3620/CVE-2018-3646 (L1 Terminal Fault, Foreshadow).
Hardware mitigations for CVE-2018-12130/CVE-2018-12126/CVE-2018-12127/CVE-2019-11091 (MDS; MFBDS, RIDL, MSBDS, Fallout, MLPDS, MDSUM).

Foreshadow attack has 3 CVEs:

CVE-2018-3615 for attacking SGX.
CVE-2018-3620 for attacking the OS Kernel and SMM mode.
CVE-2018-3646 for attacking virtual machines. The CVE-2018-3615 is not listed above as mitigated. This might be either a mistake from the wikichip or the SGX will be still vulnerable.

Going further Foreshadow is only one type of attack which is speculative execution. We have cache attacks, cache timing attacks or power and time analysis attacks. Some cache attacks are CVE-2018-12130/CVE-2018-12126/CVE-2018-12127/CVE-2019-11091 so are mitigated but don't know if all because it's not so easy to map papers on CVE numbers. ;)

I was trying to find something related to attacks on SGX on Cannon Lake but nothing yet. We need to give time for the research community to write papers.

madxor commented 5 years ago

Estimating time required to extract the key

5 minutes. That probably might be shortened even further.

madxor commented 5 years ago

Key rotation

As an alternative to key rotation, we could consider a shared key/multisig/threshold signature situation, where several parties (preferably) located in different (physical/administrative) places share some knowledge and only through combining that knowledge a valid confirmation will be provided. This would protect us from a situation when compromising one party is a successful attack.

The cost of this solution will be much more complicated communication and synchronization which might limit the number of possible transactions per second.

slavamirovsky commented 5 years ago

@paulperegud @boolafish @madxor how do we proceed with this topic? Please keep tagging me here.

madxor commented 5 years ago

Due to the fact that SGX attacks require physical collocation of the (remote) attacker, the probability of such an attack can be reduced by having rotation of enclaves located in big data centers distributed around the world.

A remote attacker in the enclave rotation time (T_R) needs to be able to pinpoint where the enclave is located (T_L), find appropriate physical machine in the data center (T_F) and inject his vm (T_I) on the same machine to have a chance to perform a successful attack (T_A).

So we need to find such T_R < T_L + T_F + T_I + T_A to be "safe".

@Nikodemek18 tagging as requested.

slavamirovsky commented 5 years ago

Yes, I confirm. After today's call with @paulperegud @boolafish we understood that this is not an easy and fast solution we needed. Let's forget about it for a while. #sgx

boolafish commented 5 years ago

Let's archive this for now then

omgnetwork / research