intel / tdx-module

Trust Domain Extensions (TDX) is introducing new, architectural elements to help deploy hardware-isolated, virtual machines (VMs) called trust domains (TDs). Intel TDX is designed to isolate VMs from the virtual-machine manager (VMM)/hypervisor and any other non-TD software on the platform to protect TDs from a broad range of software.
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
Other
43 stars 9 forks source link

Canary validation in tdh_sys_lp_init() #8

Open pansilup opened 12 hours ago

pansilup commented 12 hours ago

Summary: I have an issue related to the use of stack canaries in the TDX module's execution of the seamcall, TDH.SYS.LP.INIT. My analysis points to a scenario where canary validation at the end of tdh_sys_lp_init() fails when the FS base is changed in the middle of the function. I appreciate your help in clarifying this.

Background Information Considered:

  1. According to the TDX module's source code, the build process enables "fstack-protector-strong" (stack canaries), as seen in this line of the compiler_defs.mk file. https://github.com/intel/tdx-module/blob/2c25cb33f68b0b51f0946d6bcbb7c54fba61a6b9/compiler_defs.mk#L43
  2. The FS base in the initial per-LP SEAMCALL VMCS points to the start of the 'sys-info table' (i.e., the SEAM range start).
  3. According to the "Intel TDX Module Platform-Scope First-Time Initialization" sequence in the TDX Base Architecture Specification, TDH.SYS.INIT is executed on one LP, followed by the execution of TDH.SYS.LP.INIT on each LP.

Execution Sequence Suppose there are 2 LPs, LP0 and LP1. SEAMCALLs are issued sequentially.

  1. Sys init on LP0: TDH.SYS.INIT Updates the last global data page with a canary value. This data page consists of a copy of the original sys-info table. Changes fs base of LP0 to the last global data page address.
  2. Per LP init after completing the above. On LP0: TDH.SYS.LP.INIT no issue here On LP1: TDH.SYS.LP.INIT

Discussion: Consider the following binary code from libtdx.so applicable for tdh_sys_lp_init() function.

0000000000021780 <tdh_sys_lp_init>: start
   ...
 Get the canary value and save it on the stack
   21795: mov    %fs:0x28,%rax
   2179e: mov    %rax,0x20(%rsp)
   ...
          call tdx_local_init()
                change fs base (to last global pg addr in data rgn)
                mov    $0x6c06,%ecx
                vmwrite %rax,%rcx
   ...
  Validate the canary value saved on the stack   
   20e36: mov    %fs:0x28,%rcx
   20e3f: cmp    0x20(%rsp),%rcx
   20e44: jne    20f90 <tdh_sys_lp_init+0x1d0>
   ...
<tdh_sys_lp_init>: end

   20f90: callq  1a30 <__wrap___stack_chk_fail>

-My scenario pertains to the third SEAMCALL in the sequence above.

Questions: I have three questions:

  1. Regarding Platform Initialization and Canary Validation: My real TDX platform does not have any issues with platform initialization, and I assume none of the other users are facing such issues either. However, based on the analysis above, it appears that the canary validation at the end of tdh_sys_lp_init() should fail on any LP where the very first SEAMCALL on the LP happens to be TDH.SYS.LP.INIT. Logically, this makes sense, as I’m under the impression that the FS base must not be changed in the middle of a function if the function includes stack canary operations at both the beginning and the end. Do you have any thoughts on this? Is there any part of my analysis where I might have gone astray?

  2. About the libtdx.so Compilation: Is the production libtdx.so built without the fstack-protector-strong option?

  3. About the Canary Location in sys-info Table: It seems that the canary value is initialized by TDH.SYS.INIT in a copy of the sys-info table, which is copied onto the last global data page. In the struct sysinfo_table_s, the offset of the canary is 0x28. This offset is consistent with the offset to the FS base we observer when the canary is referenced in libtdx.so. How does the compilation process know where (i.e., at an offset of 0x28 from the FS base) the TDX module stores the canary value?

Thank you.

sergey687 commented 12 hours ago

Hello! Thank you for being interested in TDX. Regarding your questions:

  1. The FS Base update is done in the VMCS only, with VMWRITE. Which means that it will have any effect only on the next SEAMCALL (on the same LP), so until the SEAMRET the current LP still runs with the same FSBASE that it started with.

  2. Production TDX is built with the stack protector also.

  3. 0x28 is the default offset from FS that Clang 9 compiler is looking for when it builds a binary with stack protector.

pansilup commented 12 hours ago

Hi @sergey687, Thanks for the clarification. Cheers !!!