gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
598 stars 200 forks source link

Missing certificate chain when starting and attesting an enclave #589

Closed chigarovae closed 1 year ago

chigarovae commented 2 years ago

Hi! Some time ago i tested Gramine runtimes with Edgeless Marble Run premain and attestation coordinator, and everything worked fine. But recently, when i built a new image on a newely configured host, i got this error when tried to launch and attest an enclave:

rpc error: code = Unauthenticated desc = invalid quote: verifying quote failed: OE_MISSING_CERTIFICATE_CHAIN[P1:T1:premain-libos] trace: ---- return from shim_write(...) = 0x6c

It's coming from the Edgeless premain, and i looked into the attestation coordinator log. There was this error:

ERROR: Unexpected certificate type (qe_cert_data->type=3) (oe_result_t=OE_MISSING_CERTIFICATE_CHAIN) [openenclave-src/common/sgx/quote.c:_validate_qe_cert_data:76]

I contacted the Edgeless developers, and they confirmed it:

The problem is that the Coordinator is unable to verify your applications quote, due to a mismatch in the type of qe_cert_data. It is expecting the data to be a Provisioning Certification Key (PCK) certificate chain, but instead is receiving a Platform Provisioning ID (PPID).

I tried to rebuild images that used to work fine, but they started failing with this error too. After that i moved to the old host where Gramine was deployed a couple of months ago and where i had run first successful experiments, tried to build and launch test images there, and they worked fine. Here's the manifest i use to build the image:

[libos]
entrypoint = "/premain-libos"

[loader]
pal_internal_mem_size = "64M"
argv0_override = "/launcher.sh"
log_level = "all"

[sgx]
enclave_size = "2G"
debug = true
trusted_files = [ "file:/trusted_argv","file:/premain-libos","file:/app/","file:/launcher.sh",]
allowed_files.uuid = "file:/uuid"
remote_attestation = true
thread_num = 16

[sys]
stack_size = "8M"

[loader.env]
EDG_MARBLE_TYPE = { passthrough = true }
EDG_MARBLE_COORDINATOR_ADDR = { passthrough = true }
EDG_MARBLE_UUID_FILE = { passthrough = true }
EDG_MARBLE_DNS_NAMES = { passthrough = true }

#[fs.mount]
#secrets.type = "tmpfs"
#secrets.path = "/secrets"

[fs]
  mounts = [
  { path = "/native-datasets", uri = "file:/mnt/datasets-update/" },
  { path = "/datasets", uri = "file:/mnt/datasets-volume/" },
  { path = "/private", uri = "file:/mnt/private/" },
  { path = "/certs/ids.key", uri = "file:/mnt/certs/ids.key" },
  { path = "/worker_instance_name", uri = "file:/etc/hostname" }
]

So, the questions are:

  1. Were there any changes in Gramine (more specifically, in gsc tool) that could affect attestation procedure?
  2. Can i adjust gsc building procedure in some way to make image use the PCK cert chain instead of the PPID cert chain as it was before?

Thanks in advance!

dimakuv commented 2 years ago

Hm, let's look at the OpenEnclave source code:

https://github.com/openenclave/openenclave/blob/e79d334c7f3b9fb2ab3efddbacf215d1713c2413/include/openenclave/bits/sgx/sgxtypes.h#L832-L840 -- here we see that type == 3 corresponds to OE_SGX_PCK_ID_ENCRYPTED_PPID_3072.

So the Edgeless developers are correct. Though I haven't seen such errors before, I believe this is a problem in your machine SGX environment. I have a suspicion that you forgot to install the Azure DCAP client plugin: https://github.com/microsoft/Azure-DCAP-Client

Were there any changes in Gramine (more specifically, in gsc tool) that could affect attestation procedure?

No, I can't remember any changes that would relate to this.

Can i adjust gsc building procedure in some way to make image use the PCK cert chain instead of the PPID cert chain as it was before?

No, this is external to Gramine. Gramine links against the Intel SGX DCAP libraries (which in turn may use the MS Azure DCAP client plugin). So the issue seems external to Gramine, and is related to how you installed these DCAP libraries on your platform.

dimakuv commented 2 years ago

@chigarovae I got interested in this error, and googled a bit.

I found a hint of what is going on in the official DCAP library documentation from Intel: https://download.01.org/intel-sgx/dcap-1.0.1/docs/Intel_SGX_ECDSA_QuoteGenReference_DCAP_API_Linux_1.0.1.pdf

The relevant text is found under Section 3.3.1.1. Here is the relevant snippet:

If the provider library cannot be found, the sgx_ql_get_quote_config() symbol is not found within the provider library or it returns an error, the Quote Library uses the raw-TCB of the platform to certify the key and use the certification type PPID_RSA3072_ENCRYPTED as the Quote’s Certification Data Type to identify platform. If the API is found, the API returns 2 pieces of information ...

Pay attention to the text in bold. Basically, the Quote Library (aka Intel ECDSA Quoting Library, aka Intel DCAP Quote Generation Library, aka libsgx-dcap-ql.so) tries to dlopen() another helper library -- the Provider Library (aka Platform Quote Provider Library, aka DCAP Quote Provider Library, aka libdcap_quoteprov.so). If this helper library is not found/failing, then the Quote Library falls back to providing some kind of a "raw SGX quote", which doesn't have the PCK cert chain but only has the raw TCB data. This is what happens in your case.

TLDR: You need to install the Provider Library, most probably you need the one provided by MS Azure + OpenEnclave.

Some relevant links:

P.S. Edgeless developers may be also interested in my findings (if they haven't found the same already). @thomasten @daniel-weisse @m1ghtym0

dimakuv commented 1 year ago

This issue was resolved, let me close it.