gramineproject / graphene

Graphene / Graphene-SGX - a library OS for Linux multi-process applications, with Intel SGX support
https://grapheneproject.io
GNU Lesser General Public License v3.0
765 stars 262 forks source link

Remote Attestation via DCAP in an Azure VM #2062

Closed ghost closed 3 years ago

ghost commented 3 years ago

Description of the problem

First of all, thank you for all your work your are putting into this community, really appreciate it. We are currently testing the Remote Attestion with Graphene-SGX in an Azure VM. The Remote Attestation samples mbedtls-ra and our own samples don't work. We use the quoteprovider library shipped with the Azure DCAP client.

Steps to reproduce

Followed the steps documented here (same VM specs as documented): https://graphene.readthedocs.io/en/latest/cloud-deployment.html

Followed the steps for ra-tls-mbedtls and building the server with dcap works fine.

SGX=1 ./pal_loader ./server dcap &

gives us the following error message:

aesm_service returned error: 1
load_enclave() failed with error -1

We get the same error message for our own samples

dimakuv commented 3 years ago

I'm not sure we ever tried the Azure DCAP infrastructure for SGX remote attestation.

  1. How different is it from Intel DCAP (https://github.com/intel/SGXDataCenterAttestationPrimitives/)?

  2. Does Azure DCAP use "normal" AESM daemon? Does it adhere to the "normal" AESM protobuf specification? E.g. see https://github.com/oscarlab/graphene/blob/master/Pal/src/host/Linux-SGX/quote/aesm.proto.

  3. Do local examples of Graphene work on your machine? E.g. helloworld and examples like Redis or Memcached.

dimakuv commented 3 years ago

Ok, I took a brief look at the Azure DCAP Client. It seems to be a plugin shared lib to the normal Intel SGX DCAP software infrastructure. I'm pretty sure no-one from the Graphene team tried to use the Azure DCAP Client. Thus, it needs to be added to the RA-TLS codebase of Graphene.

I played with the Microsoft Azure Attestation (MAA) service recently: https://github.com/oscarlab/graphene/pull/1793. But that looks quite different from what I read about Azure DCAP Client. My current understanding is that Azure DCAP Client is a low-level lib whereas MAA is a more high-level framework (with tooling/separate utilities to do the heavy-lifting of obtaining and verifying JWTs).

I'll be glad if you could help with adding support for Azure DCAP Client. It would be good to understand what exactly Azure DCAP Client does and how it plugs itself into Intel SGX DCAP.

ghost commented 3 years ago

Hi Dmitrii, thank you that you looked into this. All local examples of Graphene work fine in the Azure VM. However, the remote attestation as described does not. I am not familiar with the inner workings of the client. We could look into this, however, currently, our ressources are very limited. If by any chance, there is any support coming to Graphene in the future, we would be more than happy to aid you in testing.

dimakuv commented 3 years ago

So I rented my own MS Azure CC VM. I followed https://docs.microsoft.com/en-us/azure/confidential-computing/quick-create-portal.

I spent a whole day debugging why DCAP/ECDSA remote attestation doesn't work on my VM (I got the same error as you did). And at some point I tried the EPID remote attestation and it worked!

In particular, I had to install some additional packages to make AESM do anything interesting (otherwise it just gave me error 30 which is "AESM is not even initialized"):

sudo apt install libsgx-launch libsgx-urts libsgx-quote-ex libsgx-dcap-ql

Then I built the EPID version of our ra-tls-mbedtls example:

RA_CLIENT_SPID=<sanitized> RA_CLIENT_LINKABLE=0 make app epid
SGX=1 ./pal_loader ./server epid &
RA_TLS_EPID_API_KEY=<sanitized> ./client epid

And it worked! When I checked the status of AESM daemon, it initialized the EPID flows:

dimakuv@sgx-ms-dcap:~/graphene/Examples/ra-tls-mbedtls$ sudo service aesmd status
● aesmd.service - Intel(R) Architectural Enclave Service Manager
...
Jan 29 11:47:15 sgx-ms-dcap systemd[1]: Starting Intel(R) Architectural Enclave Service Manager...
Jan 29 11:47:15 sgx-ms-dcap systemd[1]: Started Intel(R) Architectural Enclave Service Manager.
Jan 29 11:47:15 sgx-ms-dcap aesm_service[26624]: The server sock is 0x560b6f288990
Jan 29 11:47:26 sgx-ms-dcap aesm_service[26624]: [ADMIN]EPID Provisioning initiated
Jan 29 11:47:28 sgx-ms-dcap aesm_service[26624]: [ADMIN]EPID Provisioning successful

So I don't really understand why MS Azure instances have EPID flows instead of DCAP/ECDSA flows. I'll need to ask other people. I always thought that a particular SGX-enabled machine can only use EPID or DCAP/ECDSA.

Feel free to try EPID flows on your MS Azure instance and see if this works.

dimakuv commented 3 years ago

I made it work (somehow)!

There is some kind of magic the very first time the AESM daemon gets the DCAP request from an SGX enclave, but then it starts working.

So, I set up a clean VM. Then I installed the additional SGX PSW packages (with plugins for AESM). And then I ran an OpenEnclave example attestation (with remotesgx argument) with the SGX_AESM_ADDR option to trigger out-of-proc flows (see https://download.01.org/intel-sgx/sgx-dcap/1.9/linux/docs/Intel_SGX_ECDSA_QuoteLibReference_DCAP_API.pdf). The first time it fails, but all subsequent times it works. Apparently, there is some initial provisioning/finding the PCCS server happening. And then this SGX info is cached, and things start working.

$ sudo apt install libsgx_quote_ex

$ SGX_AESM_ADDR=1 host/attestation_host sgxremote ./enclave_a/enclave_a.signed ./enclave_b/enclave_b.signed
## first attempt fails with some OE errors

$ SGX_AESM_ADDR=1 host/attestation_host sgxremote ./enclave_a/enclave_a.signed ./enclave_b/enclave_b.signed
## second attempt works!

## now let's try Graphene RA-TLS sample with DCAP; server output below...
dimakuv@sgx-ms-dcap2:~/graphene/Examples/ra-tls-mbedtls$ SGX=1 ./pal_loader ./server dcap
  . Waiting for a remote connection ... ok
  . Performing the SSL/TLS handshake... ok
  . Closing the connection... ok

## Graphene RA-TLS sample; client output below...
dimakuv@sgx-ms-dcap2:~/graphene/Examples/ra-tls-mbedtls$ ./client dcap
  . Performing the SSL/TLS handshake... ok
  . Verifying peer X.509 certificate... ok

So Graphene now works too. Somehow, there is some magic that must happen on the first DCAP request to AESM to provision this particular MS Azure CC VM. And this magic only happens when I trigger AESM through OpenEnclave.

This allows us to circumvent the problem:

  1. Do one dummy run of openenclave/samples/attestation remotesgx.
  2. Then use Graphene-SGX.
ghost commented 3 years ago

Nice, this is very much appreciated that you looked into this. We did some tests the last couple of days in our VMs and it seems to work, even without the dummy run :) Feel free to close this issue, and again thank you for your effort.