mithril-security / blindai

Confidential AI deployment with secure enclaves :lock:
https://www.mithrilsecurity.io/
Apache License 2.0
502 stars 36 forks source link

Undefined behavior when using binding to QPL in the runner #153

Open clauverjat opened 1 year ago

clauverjat commented 1 year ago

The issue is triggered by the following code : https://github.com/mithril-security/blindai-preview/blob/main/runner/remote_attestation_sgx/src/quote_verification_collateral.rs#L246

    // Retrieving verification collateral using QPL
    let mut p_quote_collateral: *mut sgx_ql_qve_collateral_t = ptr::null_mut();
    let qv_ret = unsafe {
        sgx_ql_get_quote_verification_collateral(
            fmspc.as_ptr(),
            fmspc.len() as u16,
            ca_from_quote.as_ptr(),
            &mut p_quote_collateral as *mut *mut sgx_ql_qve_collateral_t,
        )
    };

    ensure!(
        qv_ret == Quote3Error::Success,
        "sgx_ql_get_quote_verification_collateral failed!"
    );

This code will usually work correctly, but it is broken. We discovered the issue when trying to debug a failure from sgx_ql_get_quote_verification_collateral. While debugging, we added the following before the ensure! statement to print the error code from the QPL.

println!("sgx_ql_get_quote_verification_collateral returned {:?}", qv_ret);

Quite surprisingly we got sgx_ql_get_quote_verification_collateral returned Quote3Error::Success despite the fact that qv_ret != Quote3Error::Success when executing the ensure!... To compound the mystery, the issue disappeared when compiling in debug mode, the debug builds simply printed a status different from Quote3Error::Success, (yet it was still the wrong status). This kind of strange behavior are often the result of Undefined Behavior. And this is also the case here. The UB is actually due to how we declared the FFI interface with the QPL (C-library) in our rust code :

extern "C" {
    pub fn sgx_ql_get_quote_verification_collateral(
        fmspc: *const u8,
        fmspc_size: u16,
        pck_ra: *const c_char,
        pp_quote_collateral: *mut *mut sgx_ql_qve_collateral_t,
    ) -> Quote3Error;
    pub fn sgx_ql_free_quote_verification_collateral(
        p_quote_collateral: *const sgx_ql_qve_collateral_t,
    ) -> Quote3Error;
}

The return type of the sgx_ql_get_quote_verification_collateral is declared to be a Quote3Error which is a Rust enum. But a Rust enum is assumed to only take one of the declared values (it cannot host any int8 like what is often done in a C enum). In our case the UB happened when the QPL returned an enum value that could not be represented with the Rust enum. For more information about this mismatch between Rust and C-like enum : https://mdaverde.com/posts/rust-bindgen-enum/

What should we do to fix it ? The best course of action would be to replace our custom FFI interface declaration with an FFI declaration generated by rust-bindgen. This would avoid this kind of mistake (and also would ensure that the function signature matches) We should also look if there is already a crate on crates.io which does already that.

Security impact : No (outside of enclave). Priority : Low (only impacts the error path)