gramineproject / gramine-tdx

A library OS for Linux multi-process applications, with Intel TDX support (experimental)
GNU Lesser General Public License v3.0
20 stars 5 forks source link

[PAL/TDX] Unset shared bit in the resulting addr of MapGPA #34

Closed dimakuv closed 4 months ago

dimakuv commented 4 months ago

Description of the changes

In TDX, TDG.VP.VMCALL guest-host interface may not map the whole requested GPA range. In this case, this interface returns TDG.VP.VMCALL_RETRY error code and sets register R11 (failed_addr) to the first GPA at which the mapping failed.

Our TDX PAL sanity checks that this returned failed_addr falls into the original requested GPA range. However, there was a bug that TDX returns this failing address as-is (in our case, with the shared bit set), so a naive check is out-of-range (because we compare against no-shared-bit addresses). This commit unsets the shared bit of the failing address before performing the sanity check.

This bug was detected on Ubuntu 24.04 with Linux 6.8.0 and QEMU v8.2.1. Apparently this version of QEMU/KVM performs MapGPA only in 64MB chunks, and if the requested GPA range is larger, errors our with VMCALL_RETRY. Previous versions didn't exhibit this behavior, so the bug went unnoticed for a long time.

How to test this PR?

Without this PR, gramine-tdx just failed silently.

To debug with printfs, one must do smth like this:

diff --git a/pal/src/host/tdx/pal_main.c b/pal/src/host/tdx/pal_main.c
index c63307c0..29fde127 100644
--- a/pal/src/host/tdx/pal_main.c
+++ b/pal/src/host/tdx/pal_main.c
@@ -25,6 +25,7 @@

 #include "external/fuse_kernel.h"
 #include "kernel_apic.h"
+#include "kernel_debug.h"
 #include "kernel_files.h"
 #include "kernel_hob.h"
 #include "kernel_interrupts.h"
@@ -79,8 +80,12 @@ static int shared_memory_init(uint64_t gpa_width) {
             /* done */
             break;
         }
+       failed_addr &= ~g_shared_bit; /* MAPGPA returns first failing addr with shared bit set */
         if (!(map_addr <= failed_addr && failed_addr < map_addr + map_size)) {
             /* sanity check, just in case */
+               debug_serial_io_write_int("--- failed_addr", (uint64_t)failed_addr);
+               debug_serial_io_write_int("--- map_addr", (uint64_t)map_addr);
+               debug_serial_io_write_int("--- map_addr+size", (uint64_t)map_addr + map_size);
             return -PAL_ERROR_DENIED;
         }
         map_size -= failed_addr - map_addr;
diff --git a/tools/gramine-vm.in b/tools/gramine-vm.in
index c2b0a205..d212d86c 100755
--- a/tools/gramine-vm.in
+++ b/tools/gramine-vm.in
@@ -88,11 +88,11 @@ QEMU_MEM_SIZE=${QEMU_MEM_SIZE:-"8G"}
 QEMU_CPU_NUM=${QEMU_CPU_NUM:-$GRAMINE_CPU_NUM}
 QEMU_CPU_NUM=${QEMU_CPU_NUM:-"1"}

-QEMU_PATH="qemu"
+QEMU_PATH="qemu -serial stdio"
 QEMU_VM="-cpu host,host-phys-bits,-kvm-steal-time,pmu=off,+tsc-deadline,+invtsc \
     -m $QEMU_MEM_SIZE -smp $QEMU_CPU_NUM"

The result of my debug session was like this:

--- failed_addr               2251800570757120
--- map_addr                      689963008
--- map_addr+size                      939524096
error: PAL failed Failed to initialize shared TDX memory
[ VM exited with code 1 ]

It's clear that failed_addr has bit 52 (TDX shared bit on this machine) set.

Calculating failed_addr - map_addr gives us 64MB.


This change is Reviewable

dimakuv commented 4 months ago

Apparently this version of QEMU/KVM performs MapGPA only in 64MB chunks, and if the requested GPA range is larger, errors our with VMCALL_RETRY.

Found the corresponding commit in QEMU/KVM:

https://github.com/intel-staging/qemu-tdx/commit/ac4bf9e0c98f8cf8ad5bde75f955aa8e13dd00bd