rust-osdev / x86_64

Library to program x86_64 hardware.
https://docs.rs/x86_64
Apache License 2.0
797 stars 132 forks source link

Add SGX structures and methods for dynamic memory management #348

Closed jarkkojs closed 2 years ago

jarkkojs commented 2 years ago
Introduce crate::structures::sgx for holding a flat list of SGX
micro-architectural structures, which are described in Intel SDM Vol. 3D,
chapter 34. Start with SECINFO, which describes page attributes for a
single page inside an SGX enclave (a protected memory region).

Add wrappers for EACCEPT, EACCEPTCOPY and EMODPE leaf functions of ENCLU
opcode, which are used to acknowledge, when a change is requested by the
host, and also modify enclave attributes. Add SGX return codes relevant
for the added functions.

Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
jarkkojs commented 2 years ago

@haraldh, This one has alignment checks.

Freax13 commented 2 years ago

Thank you for your contribution!

You seem to be force pushing to your branch a lot, is this ready for review or are you still working on it?

jarkkojs commented 2 years ago

Thank you for your contribution!

You seem to be force pushing to your branch a lot, is this ready for review or are you still working on it?

Thank you.

There's a problem with primordial dependency and CI. I'll check with earlier version. If it compiles, then it's good for review.

jarkkojs commented 2 years ago

@Freax13 I'll replace primordial::Address with the local VirtAddr abstraction, so expect one more push. Sorry for the noise!

Freax13 commented 2 years ago

There's a problem with primordial dependency and CI. I'll check with earlier version.

If all we need is a canonical page aligned address, we should use the Page type from this crate.

jarkkojs commented 2 years ago

There's a problem with primordial dependency and CI. I'll check with earlier version.

If all we need is a canonical page aligned address, we should use the Page type from this crate.

Agreed! Just found it myself too :-)

jarkkojs commented 2 years ago

There's a problem with primordial dependency and CI. I'll check with earlier version.

If all we need is a canonical page aligned address, we should use the Page type from this crate.

Agreed! Just found it myself too :-)

I switched to structures::paging::Page.

jarkkojs commented 2 years ago

I think this version is ready for full review. After this patch has been merged. I will contribute get_key() and report(), which enable local attestation (communication channels) between the enclaves.

jarkkojs commented 2 years ago

I noticed at least that:

I won't change the current commit before getting feedback.

jarkkojs commented 2 years ago

Also, probably I should reorder structs to alphabetical order as there will a lot more stuff.

jarkkojs commented 2 years ago

I did aforementioned changes, and I changed the parameter order in new from (flags, class) to (class, flags) because when I used it this just felt a more natural order (micro-architecture dictates that they are in different order but this a programming interface).

jarkkojs commented 2 years ago

The next iteration.

jarkkojs commented 2 years ago

As far of putting all error codes in this commit: I think it is best to consider them as operations are added. E.g. SGX_INVALID_SIG_STRUCT could be InvalidSigStruct, InvalidSigatureStructure or InvalidSignature. And there are other examples.

jarkkojs commented 2 years ago

In this commit I would consider this. I would rename SecInfo as PageInfo, as it is what it exactly is, and put to the comment that it is representing SECINFO. It is just more natural name for it and better matches other naming. SecInfo is a bit cryptic. I think it is most important that there is a hint what to lookup from SDM

Freax13 commented 2 years ago

I would rename SecInfo as PageInfo, as it is what it exactly is, and put to the comment that it is representing SECINFO. It is just more natural name for it and better matches other naming. SecInfo is a bit cryptic. I think it is most important that there is a hint what to lookup from SDM

That might be confusing considering that the SDM already defines another structure called PAGEINFO.

jarkkojs commented 2 years ago

I would rename SecInfo as PageInfo, as it is what it exactly is, and put to the comment that it is representing SECINFO. It is just more natural name for it and better matches other naming. SecInfo is a bit cryptic. I think it is most important that there is a hint what to lookup from SDM

That might be confusing considering that the SDM already defines another structure called PAGEINFO.

True! Let's keep it as it is. Good enough.

jarkkojs commented 2 years ago

I pushed still a new version because I forgot to update the field names.

Freax13 commented 2 years ago

As far of putting all error codes in this commit: I think it is best to consider them as operations are added. E.g. SGX_INVALID_SIG_STRUCT could be InvalidSigStruct, InvalidSigatureStructure or InvalidSignature. And there are other examples.

Not all operations can return all errors though, in fact most error codes can only be returned by one or two operations (see Table 37-4). Personally, I prefer many different error enums specifically tailored for each operation, but I know that some people prefer one big catch-all error type instead. @phil-opp Do you have any opinion on this?

jarkkojs commented 2 years ago

OK, now it is refined. I did not know that rustc/cargo silently skips compilation, if forgotten mod-statement, and does not report error if that happens.

jarkkojs commented 2 years ago

Change into:

    /// Page permissions and wait state
    #[repr(transparent)]
    pub struct PageFlags: u8 {
jarkkojs commented 2 years ago

One nit that I noticed: should Trim be actually Trimmed? Even SDM speaks of page being in trimmed state (34.12.2 in Volume 3D).

jarkkojs commented 2 years ago

One nit that I noticed: should Trim be actually Trimmed? Even SDM speaks of page being in trimmed state (34.12.2 in Volume 3D).

I did this and also improved the comment a bit:

    /// Removable from a running enclave
    Trimmed = 4,
jarkkojs commented 2 years ago

I promise the following SGX PR's won't be as rough as this, as this gives a reference model how to implement them for x86_64 crate.

jarkkojs commented 2 years ago

For reference: here's how I'm using it: https://github.com/enarx/enarx/pull/1497.

jarkkojs commented 2 years ago

I'd think that in this case these functions should be unsafe because of an arbitrary (page-aligned) destination and source address. @haraldh?

jarkkojs commented 2 years ago

@Freax13 I'm thinking to mark accept* and extend_permissions as unsafebecause of an arbitrary source and destination address. Do you cope with this or not? Just checking.

Freax13 commented 2 years ago

@Freax13 I'm thinking to mark accept* and extend_permissions as unsafebecause of an arbitrary source and destination address. Do you cope with this or not? Just checking.

Not quiet sure to be honest.

AFAICT the only way to cause unsafety with accept/accept_copy would be: 1.Create a reference to something.

  1. Tell the os to trim the page.
  2. Call accept on that page to accept the change to the trimmed page type.
  3. Tell to os the allocate the page again (which causes the page to be zeroed).
  4. Call accept to accept the page and optionally initialize by copying data from another page.
  5. Use the reference. It's not entirely clear to me which of those operations should be unsafe.

I can't think of any ways to cause unsafety with extend_permissions though I'm also not sure there aren't any.

In this crate we usually don't declare functions as unsafe just because they can cause cpu exceptions, however one could argue that this is not appropriate for applications running in Intel SGX enclaves because cpu exceptions could serve as side-channels.

I tempted to just mark those functions as unsafe with comments explaining that the user should review the SDM for the safety requirements. I don't know enough about Intel SGX to say that those functions can be safe.

jarkkojs commented 2 years ago

I think they are actually safe. What happens

@Freax13 I'm thinking to mark accept* and extend_permissions as unsafebecause of an arbitrary source and destination address. Do you cope with this or not? Just checking.

Not quiet sure to be honest.

AFAICT the only way to cause unsafety with accept/accept_copy would be: 1.Create a reference to something. 2. Tell the os to trim the page. 3. Call accept on that page to accept the change to the trimmed page type. 4. Tell to os the allocate the page again (which causes the page to be zeroed). 5. Call accept to accept the page and optionally initialize by copying data from another page. 6. Use the reference. It's not entirely clear to me which of those operations should be unsafe.

I can't think of any ways to cause unsafety with extend_permissions though I'm also not sure there aren't any.

In this crate we usually don't declare functions as unsafe just because they can cause cpu exceptions, however one could argue that this is not appropriate for applications running in Intel SGX enclaves because cpu exceptions could serve as side-channels.

I tempted to just mark those functions as unsafe with comments explaining that the user should review the SDM for the safety requirements. I don't know enough about Intel SGX to say that those functions can be safe.

They are safe then I guess. If EACCEPT is called to invalid page it will cause #GP, kernel will catch that and execution continues to the vDSO return path. Exception information is returned to a struct called sgx_enclave_run. OK, I was misguided :-) It's a legit situation to #GP from EACCEPT, it catches the exception that enclave run-time did not do what we expected it to do for VMAs.

Feels stupid because that is the main use case for it actually. When enclave gets informed of mmap() we scan the area to see if it matches the expected permissions.

jarkkojs commented 2 years ago

About EEXIT, it does need a wrapper too and also EREPORT and EGETKEY. We have code internally for those in Enarx and I will contribute it in chunks to x86_64. I started with this because we did not have yet these opcodes supported. I'll use these as a reference model for the rest once these have been merged.

jarkkojs commented 2 years ago

@Freax13 I fixed the comments. I think it looks pretty good now. One thing I noticed that would it be better to put E-codes to an enum?

Freax13 commented 2 years ago

They are safe then I guess. If EACCEPT is called to invalid page it will cause #GP, kernel will catch that and execution continues to the vDSO return path. Exception information is returned to a struct called sgx_enclave_run. OK, I was misguided :-) It's a legit situation to #GP from EACCEPT, it catches the exception that enclave run-time did not do what we expected it to do for VMAs.

CPU exceptions are not the only problem here, perhaps I should have explained it better, those six steps I listed would cause UB if executed like that, so I'm still in favor of marking the accept functions are unsafe. The way I see it, even if the function was safe, having it still be marked as unsafe is not a bad thing because it makes the user think about other related memory problems and interactions.

Feels stupid because that is the main use case for it actually. When enclave gets informed of mmap() we scan the area to see if it matches the expected permissions.

What do you mean?

Freax13 commented 2 years ago

One thing I noticed that would it be better to put E-codes to an enum?

Not sure yet, I'm still waiting on feedback by @phil-opp on https://github.com/rust-osdev/x86_64/pull/348#issuecomment-1066176825 before reviewing the error related code.

jarkkojs commented 2 years ago

They are safe then I guess. If EACCEPT is called to invalid page it will cause #GP, kernel will catch that and execution continues to the vDSO return path. Exception information is returned to a struct called sgx_enclave_run. OK, I was misguided :-) It's a legit situation to #GP from EACCEPT, it catches the exception that enclave run-time did not do what we expected it to do for VMAs.

CPU exceptions are not the only problem here, perhaps I should have explained it better, those six steps I listed would cause UB if executed like that, so I'm still in favor of marking the accept functions are unsafe. The way I see it, even if the function was safe, having it still be marked as unsafe is not a bad thing because it makes the user think about other related memory problems and interactions.

Just to add, after EACCEPTing the trim enclave must ask OS to EREMOVE the page.

CPU expections happen all the time with enclaves. It's the normal Any kernel that implements SGX support needs to have framework for this. In the Linux kernel vDSO populates struct sgx_enclave_run, which retains the exception information.

E.g. in a run-time you do a SYSCALL by catching the #UD.

Feels stupid because that is the main use case for it actually. When enclave gets informed of mmap() we scan the area to see if it matches the expected permissions.

What do you mean?

Our mmap implementation inside enclave works like this roughly:

  1. Process parameters (verification, and adjusting)
  2. Enclave asks OS to do host mmap with adjusted parameters.
  3. For each page do EACCEPTCOPY with the prot flags that were in the adjusted parameters.

EDIT: I mean attribute mismatch error is returned.

jarkkojs commented 2 years ago

As far of putting all error codes in this commit: I think it is best to consider them as operations are added. E.g. SGX_INVALID_SIG_STRUCT could be InvalidSigStruct, InvalidSigatureStructure or InvalidSignature. And there are other examples.

Not all operations can return all errors though, in fact most error codes can only be returned by one or two operations (see Table 37-4). Personally, I prefer many different error enums specifically tailored for each operation, but I know that some people prefer one big catch-all error type instead. @phil-opp Do you have any opinion on this?

I'll change it.

jarkkojs commented 2 years ago

I'll paste a WiP implementation of mmap handler just to bring more context to this discussion:

    fn mmap(
        &mut self,
        _platform: &impl Platform,
        addr: Option<NonNull<c_void>>,
        length: size_t,
        prot: c_int,
        flags: c_int,
        fd: c_int,
        offset: off_t,
    ) -> sallyport::Result<NonNull<c_void>> {
        // A hardware constraint.
        if (prot & libc::PROT_READ as c_int) == 0 {
            return Err(libc::EINVAL);
        }

        let addr_u = addr.unwrap().as_ptr() as usize;
        let user_mem_scope = UserMemScope;
        let result = match unsafe {
            self.syscall(
                &user_mem_scope,
                [
                    libc::SYS_mmap as usize,
                    addr_u,
                    length as usize,
                    prot as usize,
                    flags as usize,
                    fd as usize,
                    offset as usize,
                ],
            )
        } {
            Ok([rax, _]) => NonNull::new(rax as *mut libc::c_void).unwrap(),
            Err(e) => return Err(e),
        };

        let secinfo = SecInfo::new(PageType::Regular, PageFlags::from_bits_truncate(prot as u8));
        let zero_page =
            PageAddr::from_start_address(VirtAddr::new(ZERO_PAGE.as_ptr() as u64)).unwrap();

        let mut offset = 0;
        while offset < length {
            let dest_virt_addr = VirtAddr::new((addr_u + offset) as u64);
            match enclu::accept_copy(
                &secinfo,
                PageAddr::from_start_address(dest_virt_addr).unwrap(),
                zero_page,
            ) {
                Ok(()) => (),
                Err(_) => return Err(libc::EFAULT),
            }
            offset += Page::SIZE;
        }

        Ok(result)
    }

    fn munmap(
        &mut self,
        _platform: &impl Platform,
        _addr: NonNull<c_void>,
        _length: size_t,
    ) -> sallyport::Result<()> {
        Ok(())
    }
}

Also good to know that: use x86_64::structures::paging::Page as PageAddr;

jarkkojs commented 2 years ago

They are safe then I guess. If EACCEPT is called to invalid page it will cause #GP, kernel will catch that and execution continues to the vDSO return path. Exception information is returned to a struct called sgx_enclave_run. OK, I was misguided :-) It's a legit situation to #GP from EACCEPT, it catches the exception that enclave run-time did not do what we expected it to do for VMAs.

CPU exceptions are not the only problem here, perhaps I should have explained it better, those six steps I listed would cause UB if executed like that, so I'm still in favor of marking the accept functions are unsafe. The way I see it, even if the function was safe, having it still be marked as unsafe is not a bad thing because it makes the user think about other related memory problems and interactions.

Just to add, after EACCEPTing the trim enclave must ask OS to EREMOVE the page.

CPU expections happen all the time with enclaves. It's the normal Any kernel that implements SGX support needs to have framework for this. In the Linux kernel vDSO populates struct sgx_enclave_run, which retains the exception information.

E.g. in a run-time you do a SYSCALL by catching the #UD.

Feels stupid because that is the main use case for it actually. When enclave gets informed of mmap() we scan the area to see if it matches the expected permissions.

What do you mean?

Our mmap implementation inside enclave works like this roughly:

1. Process parameters (verification, and adjusting)

2. Enclave asks OS to do host mmap with adjusted parameters.

3. For each page do EACCEPTCOPY with the prot flags that were in the
   adjusted parameters.

EDIT: I mean attribute mismatch error is returned.

In EAUG, EMODPR and EMODT kernel sets permissions for a page. And with these functions enclave acknowledges that the permissions are those that it expects.

jarkkojs commented 2 years ago

I'm leaning towards unsafe because a bug could cause pointing out to e.g. unmapped address. They are freely addressable so that already kind of is a reasonable argument IMHO. Page only checks the page alignment and that type of bug cannot be catched by the static checker of rust compiler.

And you need to have some sort of framework inside enclave to track e.g. virtual memory areas (VMAs) like I've started to work on yesterday: https://crates.io/crates/mmledger. I.e. you have to wrap all this to some of types and things that make sure they're used safely.

jarkkojs commented 2 years ago

Now both operations have their own enums: AcceptError and AcceptCopyError.

npmccallum commented 2 years ago

@jarkkojs I suspect this probably belongs in the https://github.com/enarx/sgx crate. We already have multiple of these types there. SGX is only available on a limited subset of Intel processors and I suspect we don't want everyone who uses x86_64 to have to carry these types.

jarkkojs commented 2 years ago

@jarkkojs I suspect this probably belongs in the https://github.com/enarx/sgx crate. We already have multiple of these types there. SGX is only available on a limited subset of Intel processors and I suspect we don't want everyone who uses x86_64 to have to carry these types.

Similarly as for e.g. MxCsr, x86_64 is the correct basket for encl* opcodes and associated data structures. They are a server CPU feature (xeon) but I do not see reason why this crate should not support them. SGX crate is best place to e.g. wrap ioctls and the vdso but for microarch this is the place.

jarkkojs commented 2 years ago

@jarkkojs I suspect this probably belongs in the https://github.com/enarx/sgx crate. We already have multiple of these types there. SGX is only available on a limited subset of Intel processors and I suspect we don't want everyone who uses x86_64 to have to carry these types.

Similarly as for e.g. MxCsr, x86_64 is the correct basket for encl* opcodes and associated data structures. They are a server CPU feature (xeon) but I do not see reason why this crate should not support them. SGX crate is best place to e.g. wrap ioctls and the vdso but for microarch this is the place.

Any CPU feature is limited to 1-2 vendors.

npmccallum commented 2 years ago

@jarkkojs Except most of sgx isn't syscalls and vdso at all. It is all the uarch stuff. Instructions and associated data types.

jarkkojs commented 2 years ago

@jarkkojs Except most of sgx isn't syscalls and vdso at all. It is all the uarch stuff. Instructions and associated data types.

In my opinion this is the correct layering given how everything else is layered. Other ways to implement things could potentially lead two versions of the same assets. It also makes easier to use eg crypto libraries in sgx crate when it is more "host" focused.

npmccallum commented 2 years ago

@jarkkojs I disagree. Right now the sgx crate is a downstream of x86_64. Any types which are broadly useful, we upstream to x86_64 (we have already done this multiple times). Any types which are SGX-specific, we keep in sgx. This allows consumers of the crates to completely opt-in to SGX-related structures.

jarkkojs commented 2 years ago

@jarkkojs I disagree. Right now the sgx crate is a downstream of x86_64. Any types which are broadly useful, we upstream to x86_64 (we have already done this multiple times). Any types which are SGX-specific, we keep in sgx. This allows consumers of the crates to completely opt-in to SGX-related structures.

I can cope with this just stating my arguments. I'm happy to put these also to sgx crate. @Freax13 what do you think? I can understand my arguments and also Nathaniel's. Both WFM

jarkkojs commented 2 years ago

@jarkkojs I disagree. Right now the sgx crate is a downstream of x86_64. Any types which are broadly useful, we upstream to x86_64 (we have already done this multiple times). Any types which are SGX-specific, we keep in sgx. This allows consumers of the crates to completely opt-in to SGX-related structures.

I can cope with this just stating my arguments. I'm happy to put these also to sgx crate. @Freax13 what do you think? I can understand my arguments and also Nathaniel's. Both WFM

One thing that supports your arguments is that all the other things are already there. So maybe it is better to keep them in the same pile.

Freax13 commented 2 years ago

@jarkkojs I disagree. Right now the sgx crate is a downstream of x86_64. Any types which are broadly useful, we upstream to x86_64 (we have already done this multiple times). Any types which are SGX-specific, we keep in sgx. This allows consumers of the crates to completely opt-in to SGX-related structures.

I can cope with this just stating my arguments. I'm happy to put these also to sgx crate. @Freax13 what do you think? I can understand my arguments and also Nathaniel's. Both WFM

I just learnt about enarx's sgx crate today, but I would agree with @npmccallum that the code you wrote belongs into the sgx crate. That's not to say that in principle we wouldn't want to have features like this in our crate, but in this case it just makes sense to group related functionality into one crate. Besides that, whoever maintains that crate probably knows a lot more about Intel SGX than the maintainers of this crate ever will, so the code will be much better taken care of over there :)

jarkkojs commented 2 years ago

@jarkkojs I disagree. Right now the sgx crate is a downstream of x86_64. Any types which are broadly useful, we upstream to x86_64 (we have already done this multiple times). Any types which are SGX-specific, we keep in sgx. This allows consumers of the crates to completely opt-in to SGX-related structures.

I can cope with this just stating my arguments. I'm happy to put these also to sgx crate. @Freax13 what do you think? I can understand my arguments and also Nathaniel's. Both WFM

I just learnt about enarx's sgx crate today, but I would agree with @npmccallum that the code you wrote belongs into the sgx crate. That's not to say that in principle we wouldn't want to have features like this in our crate, but in this case it just makes sense to group related functionality into one crate. Besides that, whoever maintains that crate probably knows a lot more about Intel SGX than the maintainers of this crate ever will, so the code will be much better taken care of over there :)

Ok this sounds like a consensus 🙂

jarkkojs commented 2 years ago

I'm closing this PR now as it has reached a consensus what to do.