stefanberger / swtpm

Libtpms-based TPM emulator with socket, character device, and Linux CUSE interface.
Other
564 stars 136 forks source link

Windows arm64 virtual machine? #493

Closed jeremyd2019 closed 1 year ago

jeremyd2019 commented 3 years ago

As you may have heard, Windows 11 is requiring the presence of TPM 2.0, and I'm trying to provide it one via qemu on aarch64. I followed the instructions from https://qemu-project.gitlab.io/qemu/specs/tpm.html.

mkdir /tmp/mytpm1
swtpm socket --tpmstate dir=/tmp/mytpm1 \
  --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock \
  --log level=20
  -chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
  -tpmdev emulator,id=tpm0,chardev=chrtpm \
  -device tpm-tis-device,tpmdev=tpm0 \

I saw #223 and updated to EDK2 that has TPM support, but Windows still shows that the TPM 2.0 device cannot start, The I/O device is configured incorrectly or the configuration parameters to the driver are incorrect.

Am I missing something? I can see the swtpm log stuff during early boot, but nothing after

stefanberger commented 2 years ago

I haven't seen any development on aarch64 QEMU on something related to what I mentioned above. I think this is what is needed:

It feels like Windows may end up forcing the QEMU implementation of a TPM interface on another bus, possibly SPI or IC2, unless Microsoft adapts their driver. Which other TPM interface drivers than the TIS would work on a real aarch64 system? The following ones seem to be using ACPI in some form:

https://elixir.bootlin.com/linux/latest/source/drivers/char/tpm/tpm_tis_synquacer.c https://elixir.bootlin.com/linux/latest/source/drivers/char/tpm/tpm_tis_i2c_cr50.c

willcohen commented 2 years ago

My knowledge of ACPI, BIOS, hardware emulation, and the TPM spec in general is rather low (full transparency, back when I spent too much time as a highschooler building computers, I remember my motherboard's Athlon processor slot RAM DIMMs, and used PCI slots for most of the cards and an ISA slot for the Gravis Ultrasound, and ever since stuff started being hard-soldered the details of hardware started to go over my head), but I'd like to do what I can to further this and get a solution eventually landed into QEMU. Am I correctly understanding the identified problem and proposed solution?

Without fully understanding what I'm getting myself into, it seems like the discussion of solutions above might include any of the following:

stefanberger commented 2 years ago

I assume that means that QEMU will need a new file, like tpm_tis_isa.c or perhaps modifications to tpm_tis_sysbus.c, that instead works on _spi or _i2c. Do I have this right?

It is my guess that we should use either one of these buses, but I don't know whether one is better than the other and their status in QEMU. If there was existing support for EDK2 (UEFI) for ARM Hardware with hardware TPM 2 I would suggest to follow the bus they use there but there's no support in EDK2 for ARM...

gvaldezd commented 2 years ago

I assume that means that QEMU will need a new file, like tpm_tis_isa.c or perhaps modifications to tpm_tis_sysbus.c, that instead works on _spi or _i2c. Do I have this right?

It is my guess that we should use either one of these buses, but I don't know whether one is better than the other and their status in QEMU. If there was existing support for EDK2 (UEFI) for ARM Hardware with hardware TPM 2 I would suggest to follow the bus they use there but there's no support in EDK2 for ARM...

Hi, I replaced the edk2-aarch64-code.fd file (path: /Applications/UTM.app/Contents/Resources/qemu) used by UTM (3.2.4) with a new file EDK2 (builded using another Linux VM following these instructions Edk2 quickstart both VM's are installed in a Mac M1 (MacOS12.5 Monterey) with Secure Boot option enabled, the UEFI shows a TPM device with different settings (those options are not shown in original UTM file).

EDK2_TPM

However, Windows 11 boots fine but could not start the tpm device using 0XC000000 memory address.

I'm not an expert, just wanted to share that EDK2 source (tianocore/edk2) will work with the UTM.app in a Mac M1

stefanberger commented 2 years ago

@gvaldezd Right, Linux on ARM VM works with existing drivers and it's Windows on ARM in a VM that doesn't work and may require a different bus. Now what I would be looking for is running EKD2 on ARM hardware with TPM 2 support and see what kind of bus that is using.

stefanberger commented 2 years ago

There's now an RFC patch for SPI bus support for TPM TIS: https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00401.html

willcohen commented 2 years ago

I can't immediately apply that RFC patch on top of the two RFC patches it builds upon submitted a week or two earlier without some of the hunks failing -- I may have to do a little cherry-picking to get them to apply on top of one another cleanly, and it may be that I'm currently applying on top of 7.0 and need to use HEAD. Once I get those wrinkles resolved I'll try to see if this works with Windows 11.

stefanberger commented 2 years ago

There's now an RFC patch for SPI bus support for TPM TIS: https://lists.nongnu.org/archive/html/qemu-devel/2022-08/msg00401.html

I applied these patches now to a branch here: https://github.com/stefanberger/qemu-tpm/tree/v7.1.0-tpm-aarch64

It builds fine but finding a command line for QEMU for getting this device to work is a challenge of its own. How does one get an SSI bus?

willcohen commented 2 years ago

If I'm reading this generic spi thread (https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00793.html) correctly, I guess it's only been implemented for the aspeed machine types and it'll need another round before it's generic enough for other machines (would we be targeting virt for a windows VM?)

@chen-iris: do you think it's feasible to try to get Windows 11 booted on an aspeed QEMU machine (and therefore using your TPM RFC), or would we need to hold tight until the implementation is further worked out before testing and helping out?

stefanberger commented 1 year ago

@willcohen FYI, also for aspeed machine types: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg02199.html

willcohen commented 1 year ago

Lovely. Will try this out!

osy commented 1 year ago

I've logged the writes from QEMU (secure boot enabled): swtpm_io.log

I've also have a UTM log combining SWTPM, SPICE, and QMP: swtpm_utm.log

I'm not sure why the current theory is on the TPM interface. I think VMware/Parallels have TPM 2.0 emulation working, has anyone checked what interface they are using?

idarek commented 1 year ago

Could you explain how and what you want to check and will do that in VMware.

osy commented 1 year ago

Okay so I made progress! But not in getting TPM 2.0 working on QEMU but on breaking TPM 2.0 in VMware! Okay I know that doesn't sound impressive but at least we know what the issue is.

First, let's look at the TPM2 ACPI table for a VMWare Fusion VM:

[000h 0000   4]                    Signature : "TPM2"    [Trusted Platform Module hardware interface table]
[004h 0004   4]                 Table Length : 0000004C
[008h 0008   1]                     Revision : 03
[009h 0009   1]                     Checksum : F0
[00Ah 0010   6]                       Oem ID : "VMWARE"
[010h 0016   8]                 Oem Table ID : "VMW_TPM2"
[018h 0024   4]                 Oem Revision : 00000001
[01Ch 0028   4]              Asl Compiler ID : "VMW "
[020h 0032   4]        Asl Compiler Revision : 00000001

[024h 0036   4]                     Reserved : 00000000
[028h 0040   8]              Control Address : 00000000FFFFF040
[030h 0048   4]                 Start Method : 00000008

Raw Table Data: Length 76 (0x4C)

    0000: 54 50 4D 32 4C 00 00 00 03 F0 56 4D 57 41 52 45  // TPM2L.....VMWARE
    0010: 56 4D 57 5F 54 50 4D 32 01 00 00 00 56 4D 57 20  // VMW_TPM2....VMW 
    0020: 01 00 00 00 00 00 00 00 40 F0 FF FF 00 00 00 00  // ........@.......
    0030: 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
    0040: 00 00 01 00 00 00 FE FF 00 00 00 00              // ............

Compare that with QEMU:

[000h 0000   4]                    Signature : "TPM2"    [Trusted Platform Module hardware interface table]
[004h 0004   4]                 Table Length : 0000004C
[008h 0008   1]                     Revision : 04
[009h 0009   1]                     Checksum : 91
[00Ah 0010   6]                       Oem ID : "BOCHS "
[010h 0016   8]                 Oem Table ID : "BXPC    "
[018h 0024   4]                 Oem Revision : 00000001
[01Ch 0028   4]              Asl Compiler ID : "BXPC"
[020h 0032   4]        Asl Compiler Revision : 00000001

[024h 0036   2]               Platform Class : 0000
[026h 0038   2]                     Reserved : 0000
[028h 0040   8]              Control Address : 0000000000000000
[030h 0048   4]                 Start Method : 06 [Memory Mapped I/O]

[034h 0052  12]            Method Parameters : 00 00 00 00 00 00 00 00 00 00 00 00
[040h 0064   4]           Minimum Log Length : 00010000
[044h 0068   8]                  Log Address : 000000023C4C0000

Raw Table Data: Length 76 (0x4C)

    0000: 54 50 4D 32 4C 00 00 00 04 91 42 4F 43 48 53 20  // TPM2L.....BOCHS 
    0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
    0020: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
    0030: 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
    0040: 00 00 01 00 00 00 4C 3C 02 00 00 00              // ......L<....

The main difference is that VMWare uses the CRB interface while QEMU ARM64 uses TIS (start method 8 = "Command Response Buffer Interface with the ACPI Start Method" and start method 6 = "Memory mapped I/O Interface (TIS 1.2+Cancel)")

Another observation is that Windows doesn't even attempt to communicate with the TPM 2.0 at all under QEMU. The logs I posted above all takes place before winload.efi and there's no I/O access to the TPM interface afterwards.

So how do you force VMWare Fusion to not use CRB? Through some reverse engineering and guessing, I found that in the .vmx config file, you can add vtpm.crb.hw = "FALSE" to force VMWare to use a TIS interface. Once that's done, we see the ACPI table matches the start method 6 and that device manager shows Code 10 with error 0xC0000182 (the same as QEMU).

So why don't we use CRB in QEMU? In fact the CRB device is already implemented in QEMU and with a little hackery with the base address we can force it to load on ARM64 VMs. It's even pretty straightforward to add a SysBus device variant for it.

Unfortunately, the luck ends here. We begin to see Windows make I/O access on the CRB device (progress!) but the VM crashes in the tpm.sys driver when it hits this line:

       1c0022abc 21 21 43 29     ldp        w1,w8,[x9, #0x18]

This is an attempted 64-bit access on the TPM MMIO interface. Currently, this causes the HVF backend to crash with assert(isv) because when ARM64 aborts on a LDP, it does NOT decode the faulting instruction for you. This means, you (the hypervisor) must manually decode and emulate the instruction to figure out what the intended effects are. Currently, QEMU does not have this capability although I believe the KVM backend might support it (although I do not have a ARM64 KVM system to test with currently).

(P.S: Even after working around this crash, it's unknown if everything will "just work" or if there's other issues as well but this is currently blocking progress.)

osy commented 1 year ago

Okay, I got it working with some insane amount of "not shippable" code including a hard coded STP detection and patcher in HVF backend. However, it shows that CRB interface works. Hopefully, I'll have something on the UTM QEMU fork posted soon.

stefanberger commented 1 year ago

However, it shows that CRB interface works.

You mean the CRB interface works with the method as advertised by ACPI? Is the memory range (MMIO) the issue ?

osy commented 1 year ago

Yeah CRB as advertised by the ACPI table plus my QEMU CRB sysbus driver.

I don’t know what the evidence there is for the MMIO range being an issue but I’ve reversed engineered the startup code in tpm.sys and found no evidence of any checks on an allowed range or anything like that. If you want to confirm, perhaps change the hard coded address on an x86_64 QEMU vm and see if Windows produces the same error? My hunch says no.

The fact that VMware’s TIS device produces the same error on ARM64 Windows means that it’s probably a Microsoft issue (they would have the resources to fix it).

stefanberger commented 1 year ago

I don’t know what the evidence there is for the MMIO range being an issue but I’ve reversed engineered the startup code in

I don't have evidence about it nor am I an expert on aarch64 allowed MMIO ranges, but I can point you to a comment above regarding this: https://github.com/stefanberger/swtpm/issues/493#issuecomment-898851156

osy commented 1 year ago

Yeah I’ve read that comment but don’t understand the connection.

stefanberger commented 1 year ago

I am closing this issue now since it's clearly a QEMU issue.