IDT and page table code causing trouble in Confidential Computing space

There is a divergence in confidential computing technology for AMD's SEV-SNP system (AMD's original https://github.com/AMDESE/linux-svsm, SUSE's COCONUT https://github.com/coconut-svsm/svsm) due to issues claimed by SUSE engineers that I wanted to bring here to discuss. I'm not an expert in this code base, but I am concerned about a root of trust code system that could underpin most cloud workloads in the future.

From https://lwn.net/ml/linux-coco/ZBnJ6ZCuQJTVMM8h@suse.de/ (Emphasis mine)

With the current linux-svsm code-base this is difficult to achieve due to its reliance on the x86-64 crate. For supporting a user-space like execution mode the crate has too many limitations, mainly in its page-table and IDT implementations.

Technical reasons here

The IDT code from that crate, which is also used in linux-svsm, relies on compiler-generated entry-code. This is not enough to support a ring-3 execution mode with syscalls and several (possibly nested) IST vectors. The next problem with the IDT code is that it doesn't allow modification of return register state. This makes it impossible to implement exception fixups to guard RMPADJUST instructions and VMPL1 memory accesses in general.

RMPADJUST is a new x86-64 instruction for SEV-SNP that allows changing the virtual memory privilege level (VMPL) of a page. SEV-SNP introduced a new dimension of privilege on memory to allow "supervisor code" to run within a confidential context in a way that the rest of the guest code can't tamper with it. This is very important for implementing technologies like virtual TPMs to provided measured boot integrity with strong security guarantees, such as a Cloud Service Provider can't spoof a measured boot attestation and have a guest run a compromised image.

When we looked at the crate, the page-table implementation supported basically a direct and an offset mapping, which will get us into problems when support for non-contiguous mappings or sharing parts of a page-table with another page-table is needed. So in the very beginning of the project I decided to go with my own page-table implementation.

Is this a big problem to change?

I'd be interested to dive in and start contributing if there isn't anything controversial about supporting the kinds of things SUSE needs for their diverging system to fold back into AMD's. I'm hopeful that we can come together for one solid supervisor module.

Hi, thanks for reaching out!

There is a divergence in confidential computing technology for AMD's SEV-SNP system (AMD's original https://github.com/AMDESE/linux-svsm, SUSE's COCONUT https://github.com/coconut-svsm/svsm) due to issues claimed by SUSE engineers that I wanted to bring here to discuss. I'm not an expert in this code base, but I am concerned about a root of trust code system that could underpin most cloud workloads in the future.

From https://lwn.net/ml/linux-coco/ZBnJ6ZCuQJTVMM8h@suse.de/ (Emphasis mine)

With the current linux-svsm code-base this is difficult to achieve due to its reliance on the x86-64 crate. For supporting a user-space like execution mode the crate has too many limitations, mainly in its page-table and IDT implementations.

Technical reasons here

The IDT code from that crate, which is also used in linux-svsm, relies on compiler-generated entry-code. This is not enough to support a ring-3 execution mode with syscalls and several (possibly nested) IST vectors. The next problem with the IDT code is that it doesn't allow modification of return register state.

This is true, but also misleading. It's true that the compiler generated entry-code suffers from those shortcomings, but we don't restrict user to only use compiler generated code. It's perfectly fine to write custom entry-code. In practice this is likely something that a library cannot fully implement for the user as entry-code will likely vary heavily between kernels and by its very nature will have to be implemented by each kernel. I'm not sure that I understand the comment about IST vectors, but we do support loading ISTs in the TSS.

This makes it impossible to implement exception fixups to guard RMPADJUST instructions and VMPL1 memory accesses in general.

I disagree for the reasons mentioned above.

When we looked at the crate, the page-table implementation supported basically a direct and an offset mapping, which will get us into problems when support for non-contiguous mappings or sharing parts of a page-table with another page-table is needed. So in the very beginning of the project I decided to go with my own page-table implementation.

Is this a big problem to change?

Sort of, kind of. The page table implementations in this crate are very simple. They don't support any concurrent modification or sharing parts of page tables. I don't think this is something we would want to have in this crate, simply because this would make a lot of choices that we don't necessarily can and want to make for the user. There is no one-size-fits-all solution.

I'm not sure what they mean by "problems when support for non-contiguous mappings". We don't have any page table implementations that only map contiguous memory.

I'd be interested to dive in and start contributing if there isn't anything controversial about supporting the kinds of things SUSE needs for their diverging system to fold back into AMD's.

Sure, we're always open to hearing new ideas and contributions.

My takes on this:

The x86_64 crate is not the big difference between linux-svsm and COCONUT SVSM. x86_64 doesn't make any design choices that make linux-svsm and COCONUT SVSM fundamentally incompatible. Both projects have different approaches to isolation and hardening, but none of them are forced up them because of design decisions in this crate. I'd also love to hear from the linux-svsm maintainers about whether they consider the use of x86_64 to be an important difference.
Just because a codebase uses this crate, doesn't mean it has to use all of its features. It's perfectly fine to only use some abstractions e.g. VirtAddr, the instruction wrappers, etc. and not use others such as the page table implementations and compiler generated IDT entry code. Not all things can be feasible abstracted and that's okay.
I know of multiple code-bases (including ones targeting SEV-SNP) that use x86_64 and still implement the things that were described as potential problems. It's perfectly possible to do so. This crate doesn't forbid anyone from sharing page tables, they'll just have to implement it themselves.

Small side note: I already talked about some of the concerns listed here with the author of the post, though we talked in private at the time, so there is value in discussing it again in public.

Cc @joergroedel

rust-osdev / x86_64

IDT and page table code causing trouble in Confidential Computing space #417

I know of multiple code-bases (including ones targeting SEV-SNP) that use `x86_64` and still implement the things that were described as potential problems. It's perfectly possible to do so. This crate doesn't forbid anyone from sharing page tables, they'll just have to implement it themselves.

rust-osdev / x86_64

IDT and page table code causing trouble in Confidential Computing space #417

I know of multiple code-bases (including ones targeting SEV-SNP) that use x86_64 and still implement the things that were described as potential problems. It's perfectly possible to do so. This crate doesn't forbid anyone from sharing page tables, they'll just have to implement it themselves.

I know of multiple code-bases (including ones targeting SEV-SNP) that use `x86_64` and still implement the things that were described as potential problems. It's perfectly possible to do so. This crate doesn't forbid anyone from sharing page tables, they'll just have to implement it themselves.