microsoft / igvm

MIT License
91 stars 18 forks source link

Is IGVM meant to be a "bring your own firmware" package format? #72

Closed deeglaze closed 1 month ago

deeglaze commented 2 months ago

Azure previously piloted a way for users to bring their own firmware binaries to VMs. The blog post has since been taken down. I wonder if this IGVM project hints at the intention to make that process simpler by packaging more firmware-specific machine configuration into the file itself rather than VMM-specific doodads. Jon Lange has also suggested that Coconut-SVSM at least target virtual backend devices that have open specifications that the industry can collaborate on, such as the call for a simple file system format to model uefi variable storage and TPM NVRAM.

I'm interested in this possibility, and I believe so are distro companies like Canonical and RedHat.

If the answer to this is yes, I'd like to see this project get adopted by the CNCF so governance for the format can follow a more collaborative model across the industry.

-- On a technical level, I'm looking more at how much we really do need to package into the IGVM versus how much is really VMM-specific, looking particularly at KVM. For "bring your own firmware" if you're not just providing a BIOS/UEFI enum and a binary, you've got a lot more to figure out about what's going on. I'm fine saying only UEFI is expected, but there's no single directive that is easy to point out as the flash description file. Without that, I can't find the size of the ROM. This matters because, say on Intel platforms, VMX requires KVM set the TSS address globally to be included in every vCPU. Not sure where that fits in this format.

We had made an assumption based on a 2MiB UEFI ROM to put the TSS at 0xffc00000, but then for larger (up to 16MiB) we set it to 0xfed00000. We can't just use file size now to determine what this address should be, and it's not clear from the header descriptions what one should do with this.

chris-oo commented 2 months ago

The intention of the format is to be able to package the launch contents of any guest you'd like to boot inside IGVM. This would include more than just UEFI - we have projects that utilize IGVM to launch a variety of different things, though some of them use small freestanding bootshims as their initial entrypoint. For example, we can launch UEFI directly as you've noted, and I think you could also launch Linux directly without a bootshim but today we do this via a bootshim that then dispatches to Linux.

I'm open to extending the format to whatever is required to launch a given guest. In your specific case, would it be useful to have the ability to describe some kind of flash information or would being able to just set the task register on the BSP be enough? It seems like we don't allow setting TR in the native context today, but you could set it via the Hyper-V isolation type or SNP via the VMSA, which means we may just want to extend the native type to include that.

As to your non technical question, I'll have a response to that soon.

deeglaze commented 2 months ago

The vCPU fd that KVM gives user space can allow arbitrary initial state. Yes on TDX that's not the case, but if IGVM can be used for non-TEE VMs, then it is. I think that initially we'll probably need a "fat binary" that has VMM case information for setup before we can identify "standard" configuration options. Say Canonical or RedHat builds a firmware that should run on The top 5 cloud platforms. How is that a single binary?

There are platform capabilities that we're going to need to version and document more clearly for IGVM builders to claim compatibility. For example, x86 VMs on Google Compute Engine follow a pretty simple memory bank structure that is limited by the machine shapes that the platform sells. The IGVM can say "place memory whereever", but will fail to comply with the memory bank structure. We'd probably need to do compatibility checking at the API layer at firmware-as-a-resource creation time. I don't know how far static analysis is going to take us though, since there are particular VMM services a firmware may need, such as a specific interface for accessing virtualized flash storage or an interface for a network proxy for early attestation. If you followed the LKML flame war about OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL you might see where dynamic negotiation can eventually become necessary without appropriate design for static analysis.

msft-jlange commented 2 months ago

Azure has no statement on whether there will or will not be future plans to permit customers to bring their own firmware.

The motivation behind IGVM was twofold. First, the prevalence of measured and attested confidential VM contents speaks to the need for some sort of format that can describe not only the contents of a confidential VM but the load order, so that the load process can complete deterministically, including both content that must be measured and content that must be private but which must not be measured (because it describes untrusted parameters). With multiple confidential VM architectures, there is a lot of value in describing a single fille format that works on all of them. The second motivation was a recognition that as confidential computing platforms become more prevalent, there are an increasing number of components for which compatibility across host architectures is desirable (like COCONUT-SVSM, for example). Even if customers are not supplying firmware images themselves, cloud providers want to be able to consume these standard components without having to customize the process by which they are built and consumed, so having a single format that can work across host architectures is valuable.

Once a format is defined that can achieve these goals, it's not hard to see that it can also be valuable when extended to non-confidential architectures, so that it can become possible to generate a single image that can run in multiple target environments. IGVM may not be great at supporting non-confidential architectures today, but it could be.

The current state of IGVM is that it does a great job of supporting the needs of Azure and Hyper-V, but it may not be adequate for other hosting or cloud architectures. If we are to achieve the goal of cross-platform and cross-host compatibility, then it needs to evolve to be able to express the needs of those other architectures. I'm very interested in having a discussion to learn where you think it falls short for your needs so we can look into expanding it in a way that makes it useful for everybody.

That said, IGVM is intended to be an image packaging format, and not a read/write data store (think ELF or PE, not filesystem). When I hear UEFI variable storage and NVRAM, I assume you want to be able to write back to the file, which is not at all the problem that IGVM intends to solve. If those scenarios are important, then we would be much better off trying to design a complementary format that can hold persistent data - assuming that it would even be interesting to standardize such a format vs. leaving it up to individual cloud providers to manage according to the design of their infrastructure.

deeglaze commented 1 month ago

Azure has no statement on whether there will or will not be future plans to permit customers to bring their own firmware.

Very fair. I similarly cannot comment on official roadmaps.

The motivation behind IGVM was twofold. First, the prevalence of measured and attested confidential VM contents speaks to the need for some sort of format that can describe not only the contents of a confidential VM but the load order, so that the load process can complete deterministically, including both content that must be measured and content that must be private but which must not be measured (because it describes untrusted parameters). With multiple confidential VM architectures, there is a lot of value in describing a single fille format that works on all of them.

Yes, the architectures themselves have a set notion of capabilities and load methods, but in terms of what is loaded and what it requires, is that not something we can also encode in the binary? The Linux kernel for instance has boot parameters at a known location that linux.efi or a bootloader can parse to know certain amount of prerequisite configuration needs to be done before executing a single command that can dynamically negotiate the rest.

For example, Coconut-SVSM does not currently need virtio-blk or virtio-vsock, but might. It'd be fantastic to have these needs ahead of time to not fail boot during PCI enumeration in SVSM since error reporting is pretty tricky this early in boot. I suppose it's more nice-to-have than requirement.

The second motivation was a recognition that as confidential computing platforms become more prevalent, there are an increasing number of components for which compatibility across host architectures is desirable (like COCONUT-SVSM, for example). Even if customers are not supplying firmware images themselves, cloud providers want to be able to consume these standard components without having to customize the process by which they are built and consumed, so having a single format that can work across host architectures is valuable.

Agreed

Once a format is defined that can achieve these goals, it's not hard to see that it can also be valuable when extended to non-confidential architectures, so that it can become possible to generate a single image that can run in multiple target environments. IGVM may not be great at supporting non-confidential architectures today, but it could be.

The current state of IGVM is that it does a great job of supporting the needs of Azure and Hyper-V, but it may not be adequate for other hosting or cloud architectures. If we are to achieve the goal of cross-platform and cross-host compatibility, then it needs to evolve to be able to express the needs of those other architectures. I'm very interested in having a discussion to learn where you think it falls short for your needs so we can look into expanding it in a way that makes it useful for everybody.

That said, IGVM is intended to be an image packaging format, and not a read/write data store (think ELF or PE, not filesystem). When I hear UEFI variable storage and NVRAM, I assume you want to be able to write back to the file, which is not at all the problem that IGVM intends to solve.

I am also conceptualizing it as a packaging format. I'm thinking that ELF32 vs ELF64 is a statically-determinable declaration of required capability. ELF is conceptually just a format for loading code into memory, but it exists in a larger ecosystem that doesn't really work without being loaded into Linux. Windows only supports ELF by running in Linux compatible mode. That ability to selectively choose a backend based on packaging information is what I'm looking for in IGVM. I wouldn't want an entirely different format to select backend support, just simply a different header. It behooves us to try to align on common headers to avoid platform-specific extensions wherever possible.

I would thus want to say, "only load this if you can give it a virtio-blk >= 1.0 device". The format is already demanding certain machine shape information by stating where in physical memory the file should be loaded. Typically we've seen memory banks [ [0..3G], [4G-2M, 4G], [4G..(RAMSize+1G)] ] where the 2M (up to 16M) at top of 32bit memory is for the UEFI ROM. RAM is charged to users based on the machine shape that determines RAM size. If the IGVM requires a RAM bank at [8T-2G, 8T-2G+16M], then that's unaccounted-for overhead that isn't in the machine shape. We can certainly work on making these configurable and getting the billing right for the added overhead, but we're not at a point where arbitrary IGVM files can be loaded untrusted. We need static analysis. Doable with the current format, but I'm not sure what the general expectation is for IGVM support. I might be able to argue RAMSize + 32M ROM to allow 16M UEFI and 16M SVSM and whatever layout is fine so long as MMIO works as expected for the architecture, but still what does that mean, statically?

If those scenarios are important, then we would be much better off trying to design a complementary format that can hold persistent data - assuming that it would even be interesting to standardize such a format vs. leaving it up to individual cloud providers to manage according to the design of their infrastructure.

Is the format necessary for the VMM to know, or just the device interface? If providing block storage, we don't need to know if it's using EXT2 or FAT32 or whatever. I'm interested in an SVSM protocol that is the paravisor version of EFI_FIRMWARE_MANAGEMENT_PROTOCOL such that we can reflash the paravisor and UEFI together (or separately, eventually) provided a security policy passes to replace the ROM and reboot into the measurement-changing firmware. In that case, we need a VMM-provided mechanism to do so that is different from the block storage, but the format would be just IGVM. If there are configuration options as well that need to be mutable, I think those need to be in IGVM headers.

Tagging in a colleague more familiar with IGVM /cc @AdamCDunlap

msft-jlange commented 1 month ago

That said, IGVM is intended to be an image packaging format, and not a read/write data store (think ELF or PE, not filesystem). When I hear UEFI variable storage and NVRAM, I assume you want to be able to write back to the file, which is not at all the problem that IGVM intends to solve.

I am also conceptualizing it as a packaging format. I'm thinking that ELF32 vs ELF64 is a statically-determinable declaration of required capability. ELF is conceptually just a format for loading code into memory, but it exists in a larger ecosystem that doesn't really work without being loaded into Linux. Windows only supports ELF by running in Linux compatible mode. That ability to selectively choose a backend based on packaging information is what I'm looking for in IGVM. I wouldn't want an entirely different format to select backend support, just simply a different header. It behooves us to try to align on common headers to avoid platform-specific extensions wherever possible.

I would thus want to say, "only load this if you can give it a virtio-blk >= 1.0 device".

I see a distinction between "can I load this file" and "will this file do something useful", and I think you're questioning where IGVM should fall with respect to that distinction. I'm no Linux expert, but I believe ELF doesn't permit, for example, encoding of ideas like "I expect some particular loadable kernel module to be present" or "I need foo.so to be at least version X". It is possible for the ELF to specify the minimum version of the kernel it expects, but that's fairly coarse-grained and may not adequately permit the loader to determine whether the services expected by a given program are actually present. Instead, programs are expected to detect whether the required functionality is present, and if not, to present some appropriate error information to declare the incompatibility.

It would be entirely reasonable to make the same statements about IGVM: it should be specific enough to determine whether it's possible to launch its contents, but if that code expects some system service that isn't present, it should have a mechanism to report the error dynamically. If nothing else, it seems valuable to define a standard mechanism by which an IGVM image can report compatibility issues so that the VMM that loaded it is capable of capturing a meaningful error and reporting it as required (to the user, to the VM management infrastructure, or wherever). After all, startup errors may arise for many reasons other than whether expected host-provided functionality is missing; for example, there may be a dynamic failure due to an inability to contact some other critical service. Consequently, providing a standard startup error reporting mechanism would provide benefits to improve diagnosability beyond simply knowing what version of virtio-blk is around.

If such a standard error mechanism were to exist, then it seems likely that it could cover the basic launch problems as well. If the IGVM headers were to prescribe host compatibility requirements, then any failure to load would result in a loader failure that would have to be reported up the stack to whoever requested the load to provide sufficient diagnostic information. But that same reporting chain could be fed by a standard error reporting mechanism (if such a thing were to exist), and in that case, that the ability of the higher-level component that captures the error could still receive what is effectively the same error but would not have to know (or care) whether the error originated from the IGVM loader or from the running code. And naturally, the ability to provide errors dynamically from running code offers a great deal more flexibility than errors that arise because of an inability to load, if for no other reason that the code inside the IGVM that reports the errors can evolve far more rapidly than any file format standard that endeavors to encapsulate the policy that describes the IGVM's requirements. So I see a ton of upside in building out standard and dynamic error reporting, more than in building out more prescriptive IGVM load policy.

How to position IGVM as a load vs. load+runtime policy format is certainly an interesting question, but if we tilt in favor of robust and standard dynamic error reporting, and restricting IGVM just to a load format, is there anything we would be missing out on?

The format is already demanding certain machine shape information by stating where in physical memory the file should be loaded. Typically we've seen memory banks [ [0..3G], [4G-2M, 4G], [4G..(RAMSize+1G)] ] where the 2M (up to 16M) at top of 32bit memory is for the UEFI ROM. RAM is charged to users based on the machine shape that determines RAM size. If the IGVM requires a RAM bank at [8T-2G, 8T-2G+16M], then that's unaccounted-for overhead that isn't in the machine shape. We can certainly work on making these configurable and getting the billing right for the added overhead, but we're not at a point where arbitrary IGVM files can be loaded untrusted. We need static analysis. Doable with the current format, but I'm not sure what the general expectation is for IGVM support. I might be able to argue RAMSize + 32M ROM to allow 16M UEFI and 16M SVSM and whatever layout is fine so long as MMIO works as expected for the architecture, but still what does that mean, statically?

I'm not sure I see a trusted connection between whatever is declared in the IGVM headers and what the IGVM code tries to consume. An IGVM file that declares a specific set of memory ranges is still free to attempt to use other memory not provided in the host, so the host must be prepared to detect that and reject such a request when it cannot be supported. If the host is already able to do that, then how does providing additional information in the IGVM file make the host's job easier? I can understand that an early rejection of incompatible requirements can result in better error reporting, but that's an issue of reliability, and not one of trustworthiness.

That said, the current IGVM specification permits the image to declare ranges of RAM that must be present or it can't even get off the ground (and by this, I mean that it can't get far enough to report errors during the launch process). A robust IGVM file should minimize these ranges to the greatest extent possible, so that it is able to run in the maximum number of possible configurations. I can really only speak to how we use IGVM internally, but our IGVM files mandate only that the low 64 MB or so of memory be present, and everything else is enumerated dynamically - and on the surface, this doesn't appear to be unusual with respect to any RAM size configuration. An IGVM file that demands memory to be present near the 8T mark seems very odd indeed, and I'm not sure such an IGVM file can claim to be reasonably built or that it needs to be supported. The only "unusual" IGVM configuration I'm aware of is one in which the IGVM declares that the 4KB to 64 KB just below 4 GB must be valid as RAM, and this is because the TDX architecture requires code to be present at those guest addresses - so this is an unavoidale requirement. But in the other cases, I think the correct way forward is not to bend the hosting environment to support the demands of the IGVM, but to ensure that the IGVM is flexible enough not to require anything unusual.

Or perhaps I am misunderstanding the point you are making.

Is the format necessary for the VMM to know, or just the device interface? If providing block storage, we don't need to know if it's using EXT2 or FAT32 or whatever. I'm interested in an SVSM protocol that is the paravisor version of EFI_FIRMWARE_MANAGEMENT_PROTOCOL such that we can reflash the paravisor and UEFI together (or separately, eventually) provided a security policy passes to replace the ROM and reboot into the measurement-changing firmware. In that case, we need a VMM-provided mechanism to do so that is different from the block storage, but the format would be just IGVM. If there are configuration options as well that need to be mutable, I think those need to be in IGVM headers.

IGVM update is an interesting scenario that I haven't thought about enough, but conceptually, it seems to me that the simplest starting point would just be for the guest to supply a new IGVM to the host over some guest/host interface, and for the host to commit that IGVM file according to its own implementation. It doesn't seem to me that any new information is required in the IGVM headers. Replacing the paravisor and UEFI separately sounds extremely difficult because they are normally part of the same IGVM image, which means they have a specific load order requirement and affect the expected measurement - and thus they potentially impact the entirety of the IGVM file. With that being the case, it's going to be both simpler and more robust to replace the entire IGVM. I'm not yet seeing what information would need to be expressed in IGVM in order for a guest to supply a new IGVM to the host over the guest/host channel (which is not yet defined), but I am willing to be corrected.

deeglaze commented 1 month ago

Thanks for improving my understanding of the model. I have no further questions.