Open ereshetova opened 8 months ago
What is missing for me for this issue and many others in this project is what is the threat model, impact, and alternative mitigations? The major alternative mitigation for any PCI device is PCI TDISP. That specification documents the threat model and how to bring a device into the TCB. Critically that specification assumes that the MSI/MSI-X interface is not a valid vulnerability vector. I.e. TDISP precludes the need to deploy protection against cases where hardware attacks the guest unless that guest opts in to being attacked.
Otherwise it is a "boil the ocean" amount of work to individually mitigate all the ways in which malicious hardware can confuse software. Enter PCI TDISP to preclude hardening in favor attestation.
The threat model is here: https://elixir.bootlin.com/linux/latest/source/Documentation/security/snp-tdx-threat-model.rst If smth is missing in your opinion, please shout, we can add/update it.
When it comes to alternatives, you are implying that a CoCo guest can be secure only with a set of TDISP devices. This is not how these guests are deployed today and it might not be how all the future practical deployments will be, but let's take this as assumption for a discussion. The problem is that even if you assume this, Linux guest is still going to be open for attacks via all the normal device drivers (and PCI/MSI) because we dont have anything in place currently to stop sharing the arbitrary device driver memory with the host, prevent these drivers accessing pci config space, PIO, etc. So you would need to define the TDISP in Linux in a way that it not only allows a device that a guest decided to trust to access the guest private memory (which is #1 goal of TDISP), but you also need to provide a control to disable all other drivers (apart from TDISP). This way the user of a CoCo VM can say: i want to run this VM with this set of TDISP devices and rest of attack vectors via malicious devices are addressed so that I can be secure. Is this smth TDISP support in Linux will do?
The threat model is here: https://elixir.bootlin.com/linux/latest/source/Documentation/security/snp-tdx-threat-model.rst If smth is missing in your opinion, please shout, we can add/update it.
That threat model was taken up by the PCI SIG. Their response was CMA (Component Measurement and Authentication), IDE (link Integrity and Data Encryption), and TDISP (Trusted execution environment Device Interface Security Protocol). It needed all that infrastructure precisely because there is decades of momentum behind the fact that drivers trust their devices. In order to sustain the "drivers trust devices" model it means that a new model, "system owner trusts devices", needed to be layered on top, not "knock down the 'drivers trusts devices'" pillar.
When it comes to alternatives, you are implying that a CoCo guest can be secure only with a set of TDISP devices.
No, TDISP is simply mechanism to assert the provenance of the device interface. It assumes the driver trusts the device. The policy for accepting the device relies on those mechanisms. The implication is that devices operating in shared mode and devices in private mode that are wrongly accepted into the TCB can attack integrity and confidentiality. All TDISP does is shift the onus to the TD owner to do due diligence on either avoiding devices operating in shared mode, or deeply understanding the threat imposed by accepting a given device interface into the TCB.
Linux, as a response to CMA, IDE, and TDISP, will grow the concept of device interfaces measured by the platform TSM (Trusted execution environment Security Manager). It is up to the TD owner to use those mechanisms to accept the device for a "trusting" driver to consume, or otherwise accept the legacy risk of devices operating in shared mode.
At no point is "hardening of MSI/MSI-X" table interaction in scope, nor is it sufficient for all the ways that similar device-specific mechanisms could cause problems. Knocking down the pillar of "drivers trust devices" fundamentally means it is no longer PCI. So, it is either start over with a new non-PCI bus definition that incorporates memory safe hardware interfaces into its design from day one, or use the PCI-SIG developed response to layer an attestation interface (PCI CMA, IDE, and TDISP) on top of legacy memory-unsafe PCI interfaces.
The following hardening fixes around MSIX table size/offset handling, aiming to prevent a malicious device or VMM from triggering bugs by supplying bogus values were discovered by a fuzzer and the fixes were submitted in past:
https://lore.kernel.org/lkml/20230119170633.40944-1-alexander.shishkin@linux.intel.com/