intel / MultiArchUefiPkg

Multi-Architecture UEFI Environment Driver
GNU Lesser General Public License v2.1
54 stars 12 forks source link

Emulation Failed: Invalid instruction (UC_ERR_INSN_INVALID) due to 'rdrand' usage #48

Closed ghost closed 6 months ago

ghost commented 7 months ago

When loading the PCIe OptROM for a Micron 7300 MAX U.2 drive on AArch64 (an ADLINK Ampere Altra Dev Kit) I get an emulation failure, as shown in the attached screenshot. I'm using MultiArchUefiPkg commit 7972cdf844b4a4c22bb1c4f4b8d13e427bc9a2e0 from Feb 6th, 2024.

GLt0n6LakAAxlkq

I extracted the optrom image using PciRom and attached it. The information PciRom shows about it is:

ROM 0x0000E200 bytes
--------------------
+0x0: UEFI image (0xE200 bytes)
          Machine Type: 0x8664
             Subsystem: 0xB
    InitializationSize: 0xE200 (bytes)
  EfiImageHeaderOffset: 0x1B8
            Compressed: no

00-UEFI.rom.gz

andreiw commented 7 months ago

So it fails straight at the EntryPoint of the image due to an unsupported (by Unicorn) instruction: rdrand.

If I recall correctly, this wasn't the end of the trouble... but you had an actual crash during ExitBootServices. I'd want to fix that first, as that sounds like a MUA bug due to a failed image start.

00000000000002e0 <.text>:
     2e0:       0f c7 f0                rdrand %eax
     2e3:       72 04                   jb     0x2e9
     2e5:       48 31 c0                xor    %rax,%rax
     2e8:       c3                      ret
     2e9:       66 89 01                mov    %ax,(%rcx)
     2ec:       b8 01 00 00 00          mov    $0x1,%eax
     2f1:       c3                      ret
     2f2:       0f c7 f0                rdrand %eax
     2f5:       72 04                   jb     0x2fb
     2f7:       48 31 c0                xor    %rax,%rax
     2fa:       c3                      ret
     2fb:       89 01                   mov    %eax,(%rcx)
     2fd:       b8 01 00 00 00          mov    $0x1,%eax
     302:       c3                      ret
     303:       48 0f c7 f0             rdrand %rax
     307:       72 04                   jb     0x30d
     309:       48 31 c0                xor    %rax,%rax
     30c:       c3                      ret
     30d:       48 89 01                mov    %rax,(%rcx)
     310:       b8 01 00 00 00          mov    $0x1,%eax
andreiw commented 7 months ago

The twitter crash had Synchronous Exception at 0xf14fc7e8, which with image base 0xf14f5000 corresponds to offset 0x77E8. This looks like data (no int3 after ret). The region is protected from execution, but MUA is not claiming it.

The strange thing is how anything in the image could have been invoked if it failed to start. Here's a theory: a different image was loaded after the the NVMe driver failed to load, but MUA didn't clean up the protection mappings.

    7792:       75 4c                   jne    0x77e0
    7794:       4c 89 44 24 30          mov    %r8,0x30(%rsp)
    7799:       4c 8d 0d b8 0e 00 00    lea    0xeb8(%rip),%r9        # 0x8658
    77a0:       48 89 54 24 28          mov    %rdx,0x28(%rsp)
    77a5:       4c 8d 05 fc 2c 00 00    lea    0x2cfc(%rip),%r8        # 0xa4a8
    77ac:       48 89 4c 24 20          mov    %rcx,0x20(%rsp)
    77b1:       ba 00 02 00 00          mov    $0x200,%edx
    77b6:       48 8d 4c 24 40          lea    0x40(%rsp),%rcx
    77bb:       e8 a8 ef ff ff          call   0x6768
    77c0:       48 8b 05 39 37 00 00    mov    0x3739(%rip),%rax        # 0xaf00
    77c7:       48 85 c0                test   %rax,%rax
    77ca:       74 14                   je     0x77e0
    77cc:       48 8b 40 40             mov    0x40(%rax),%rax
    77d0:       48 85 c0                test   %rax,%rax
    77d3:       74 0b                   je     0x77e0
    77d5:       48 8d 54 24 40          lea    0x40(%rsp),%rdx
    77da:       48 8b c8                mov    %rax,%rcx
    77dd:       ff 50 08                call   *0x8(%rax)
    77e0:       48 81 c4 48 02 00 00    add    $0x248,%rsp
    77e7:       c3                      ret
    77e8:       c6 05 b9 36 00 00 01    movb   $0x1,0x36b9(%rip)        # 0xaea8
    77ef:       c3                      ret
andreiw commented 7 months ago

GLt0n6CaoAEiFyd

andreiw commented 7 months ago

Ah I misread this... 0x48C is the entry point, and the driver did register an exitbootservices handler.

andreiw commented 7 months ago

Okay, back to the rdrand issue.

andreiw commented 7 months ago

This is so weird.

If the 'rdrand' insn succeeds, the NVMe driver blasts the value into a value formed by taking the ImageHandle & 0xffff. How the hell is this supposed to work?

I loaded up the driver in Ghidra.

image image image
andreiw commented 7 months ago

With the following the driver loads. The rdrand insn has to return failure to not trigger a random crash in the driver. Presumably, on a real x86 system scribbling within the first 64k is somehow ok? Ugh.


diff --git a/qemu/target/i386/int_helper.c b/qemu/target/i386/int_helper.c
index 5dea08ab..3e6bce5f 100644
--- a/qemu/target/i386/int_helper.c
+++ b/qemu/target/i386/int_helper.c
@@ -476,6 +476,9 @@ target_ulong HELPER(rdrand)(CPUX86State *env)
 {
     target_ulong ret;

+    env->cc_src = 0;
+    return 0;
+
     if (qemu_guest_getrandom(&ret, sizeof(ret)) < 0) {
         // qemu_log_mask(LOG_UNIMP, "rdrand: Crypto failure: %s",
         //               error_get_pretty(err));
diff --git a/qemu/target/i386/unicorn.c b/qemu/target/i386/unicorn.c
index f10b70e2..ca755edf 100644
--- a/qemu/target/i386/unicorn.c
+++ b/qemu/target/i386/unicorn.c
@@ -73,7 +73,7 @@ void x86_reg_reset(struct uc_struct *uc)
                                 CPUID_FXSR | CPUID_SSE | CPUID_CLFLUSH;
     env->features[FEAT_1_ECX] = CPUID_EXT_SSSE3 | CPUID_EXT_SSE41 |
                                 CPUID_EXT_SSE42 | CPUID_EXT_AES |
-                                CPUID_EXT_CX16;
+                                CPUID_EXT_CX16 | CPUID_EXT_RDRAND;
     env->features[FEAT_8000_0001_EDX] = CPUID_EXT2_3DNOW | CPUID_EXT2_RDTSCP;
     env->features[FEAT_8000_0001_ECX] = CPUID_EXT3_LAHF_LM | CPUID_EXT3_ABM |
                                         CPUID_EXT3_SKINIT | CPUID_EXT3_CR8LEG;
andreiw commented 7 months ago

Well more poking around the driver and I still can't tell what it meant to accomplish with rdrand usage, but I'm no Ghidra/efiseek whiz.

I'll check in a "fix" of sorts that should help, by implementing an rdrand insn that always fails.

andreiw commented 7 months ago

Another mechanism could be to simply ignore reads/writes to bottom 64k, going on the theory this isn't the first or the last bit of code that accidentally scribbles something around address 0. Something like that could be opt-in, but enabled by default for running x86 code.

ghost commented 6 months ago

Thanks! I've verified it works now.