stefanberger / swtpm

Libtpms-based TPM emulator with socket, character device, and Linux CUSE interface.
Other
563 stars 136 forks source link

Windows 11 Get-TpmSupportedFeature fails triggering failures in system log #852

Closed ncstate-daniel closed 2 months ago

ncstate-daniel commented 3 months ago

Describe the bug

Windows 11 is able to install just fine with swtpm backing it, and Get-TPM, tpm.msc, etc all show a happy healthy TPM chip, but when the SCCM agent tries to install (and retries over and over), we get a TPM error in the system log that repeats over and over as SCCM continues to try to access the virtual chip. Furthermore, Get-TpmSupportedFeature returns absolutely nothing, which I suspect is at the root of the problem, and each run of it generates an error.

Required: To Reproduce (without these steps your issue may be deleted)

Steps to reproduce the behavior/issue showing all commands on command line, needed XML or JSON (if necessary), etc.:

Note: This is going to be difficult unless you have an SCCM environment to test with. However if my suspicion is correct, we can simply test it by executing Get-TpmSupportedFeature

  1. Install a fresh Windows 11 machine under, in this case, ovirt under Rocky Linux 9 (including the expected os drivers for virtio and whatnot)
  2. Connect to your new Windows 11 machine and run Get-TpmSupportedFeature in powershell (it should return absolutely nothing, may even pause)
  3. Open Event Viewer and go to the System log (you should see a system error for TPM, attached details below)
  4. Repeat the powershell command multiple times and watch it generate new System logs

Expected behavior

It should return:

PS C:\Users\daniel.admin> get-tpmsupportedfeature
key attestation

Desktop (please complete the following information):

Versions of relevant components

Log files swtpmlog.txt Please note I see zero errors from the host OS perspective... just from within Windows itself.

Event Viewer log entry reads: The initialization of the Trusted Platform Module (TPM) failed. The TPM may be in failure mode. To allow diagnosis, contact the TPM manufacturer with the attached information. With the following details: swtpmwinevent.txt

Additional context As far as I can tell, SCCM tries to access the TPM's features, finds nothing, and croaks out during initialization. It then retries immediately, failing over and over and over. A temporary workaround I have is to install the OS with the TPM assigned, then once it's installed 'rip out' the TPM in ovirt's configs. After that everything works fine. (Windows doesn't actually NEED the TPM unless you are doing something like bitlocker)

ncstate-daniel commented 3 months ago

Updated the title to be more focused on the direct issue at hand

stefanberger commented 3 months ago

I will have a look at it when I get a chance. Had you tried previous Win11 versions?

ncstate-daniel commented 3 months ago

Howdy! I haven't tried previous win11 versions -- however I do have some new, semi bizarre, news. One of my coworkers installed Windows 11, using swtpm 0.7.3, under Ubuntu 24.04 and somehow it's working fine in that scenario. So tomorrow I will gatherer the component info from above for that OS and anything else we can discover. This may imply the problem lies in EL9 or in some configuration ovirt is using. (his setup is just libvirt/kvm based, doesn't involved ovirt)

ncstate-daniel commented 3 months ago

That definitely sounds promising! I won't be able to test upgrading libtpm and reactivating tpm on windows until tomorrow -- but we did get the component versions from Ubuntu 24.04:

swtpm: 0.7.3-0ubuntu5 libtpms: 0.9.3-0ubuntu4 openssl: 3.0.13-0ubuntu3.1 gnutls: 3.8.3-1.1ubuntu3.1 ovirt: N/A Host OS: Ubuntu 24.04

I'll try upgrading libtpms on Rocky 9 tomorrow to see if it makes a difference!

Is there anything in the localca configs that could have any likelihood of having an effect btw? Nothing stood out to me.

stefanberger commented 3 months ago

Especially compare the libtpms versions since this is the vTPM implementation. v0.9.1 is quite old and you seem to be using it. v0.9.6 is recent and may have some of the necessary bugfixes:

version 0.9.7 [not yet released]:
  - tpm2: Fix issue in CryptParameterEncryption() (TPM 2 errata v1.4)
  - tpm2: Sync fix in TPM2_PolicyAuthorize() with upstream
  - tpm2: Sync CryptParameterDecrypt implementation  with upstream
  - tpm2: Fix issue related to CryptGenerateKeyDes (TPM 2 errata v1.4)
  - tpm2: Check size of TPM2B_NAME buffer before reading 2 bytes from it

version 0.9.6:
  - tpm2: Check size of buffer before accessing it (CVE-2023-1017 & -1018)

version 0.9.5:
  - tpm2: Do not set RSA_FLAG_NO_BLINDING on RSA keys anymore
  - tpm2: Fix a potential overflow expression (coverity)
  - tpm2: Fix size check in CryptSecretDecrypt

version 0.9.4:
  - tpm: #undef printf in case it is #define'd (OSS-Fuzz)
  - tpm2: Check return code of BN_div()
  - tpm2: Initialize variables due to gcc complaint (s390x, false positive)
  - tpm12: Initialize variables due to gcc complaint (s390x, false positive)
  - build-sys: Fix configure script to support _FORTIFY_SOURCE=3

version 0.9.3:
  - build-sys: Add probing for -fstack-protector
  - tpm2: Do not call EVP_PKEY_CTX_set0_rsa_oaep_label() for label of size
    (OSSL 3)

version 0.9.2:
  - tpm2: When writing state initialize s_ContextSlotMask if not set
stefanberger commented 3 months ago

Is there anything in the localca configs that could have any likelihood of having an effect btw? Nothing stood out to me.

Unlikely I would say.

ncstate-daniel commented 3 months ago

Mostly just adding some notes here while I wait for a reinstall.

ovirt is running swtpm like this: /usr/bin/swtpm socket --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/20-mesmer-wn11.chass.nc-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/fb32dca3-4e05-43f6-9482-da650272ce9d/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/mesmer-wn11.chass.ncsu.edu-swtpm.log --terminate --tpm2

Simply updating libtpms to 0.9.6 did not immediately help. (I reenabled the TPM in ovirt and rebooted the machine and immediately the errors came back) However, I have now reset swtpm to the version that comes with the OS, 0.8.0, and made sure libtpms was still at 0.9.6, and performed a full recreation of the VM to make sure everything is fresh and it is in the process of reinstalling right now.

ncstate-daniel commented 3 months ago

On the Ubuntu 24.04 box that is working, the call is: /usr/bin/swtpm socket --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/8-skeksis-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/skeksis-swtpm.log --terminate --tpm2 I see no real difference there.

One difference is that one is storing it's state and such on gluster while the other is a simple direct file system (xfs). Not sure why that would matter but bringing it up.

Just to compare again -- swtpm configs in etc are--- one that works:

root@mystic:/home/foreman_re# cat /etc/swtpm*
statedir = /var/lib/swtpm-localca
signingkey = /var/lib/swtpm-localca/signkey.pem
issuercert = /var/lib/swtpm-localca/issuercert.pem
certserial = /var/lib/swtpm-localca/certserial
--platform-manufacturer Fedora
--platform-version 2.1
--platform-model QEMU
# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

one that doesn't:

[root@hss53vm foreman_re]# cat /etc/swtpm*
statedir = /var/lib/swtpm-localca
signingkey = /var/lib/swtpm-localca/signkey.pem
issuercert = /var/lib/swtpm-localca/issuercert.pem
certserial = /var/lib/swtpm-localca/certserial
--platform-manufacturer Fedora
--platform-version 2.1
--platform-model QEMU
# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

basically precisely the same.

ncstate-daniel commented 3 months ago

Another data point -- this also worked just fine under Ubuntu 22.04 with the following components: swtpm: 0.6.3-0ubuntu3.2 libtpms: 0.9.3-0ubuntu1.22.04.1 openssl: 3.0.2-0ubuntu1.15 gnutls: 3.7.3-4ubuntu1.5 ovirt: N/A Host OS: Ubuntu 22.02

Again I'm noticing the newer libtpms as you mentioned previously. =)

ncstate-daniel commented 3 months ago

Ok after rebuild of the VM, I'm still getting the same TPM error. I'm starting to wonder if Gluster is the problem somehow. I had found an issue in proxmox's issue tracker where swtpm was having problems when gluster was involved, but that appears to have something to do with how proxmox references their storage rather than anything to do with the storage itself. Any ideas of what else I could try? (and can you think of any reason gluster might cause a problem?)

ncstate-daniel commented 3 months ago

Well I just looked back at the command line under ovirt and -- none of those paths are in gluster so that shoots down that idea. =/

ncstate-daniel commented 3 months ago

Edit: duh -- this is because I connected to it remotely. Nothing truly new here.

Some new information included in the event log that I hadn't seen before:

The description for Event ID 27 from source TPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

30
405
1229081856
1398218784
542396493
0
0
1
538513443
1455670
0

The system cannot find the file specified
ncstate-daniel commented 3 months ago

Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:

root@mystic:/home/foreman_re# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600
swtpm: SWTPM_NVRAM_Lock_Dir: Could not open lockfile: Permission denied

On the rl9 box, I don't get a perm denied:

[root@hss52vm x86_64]# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/fb32dca3-4e05-43f6-9482-da650272ce9d/tpm2,mode=0600
{ "type": "swtpm", "states": [] }
stefanberger commented 3 months ago

Ok after rebuild of the VM, I'm still getting the same TPM error.

What do you mean by 'rebuild of the VM'?

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

What I would do at this point is build libtpms v0.9.6 (from git) and make sure that it gets installed in the right place and overwrites the old version of libtpms (with which Win11 had issues). Then try Win11 with this. If success, then build libtpms v0.9.3 (from git) and install it and start VM again. If it fails at this point it's something in libtpms.

I'm starting to wonder if Gluster is the problem somehow. I had found an issue in proxmox's issue tracker where swtpm was having problems when gluster was involved, but that appears to have something to do with how proxmox references their storage rather than anything to do with the storage itself. Any ideas of what else I could try? (and can you think of any reason gluster might cause a problem?)

I never used gluster. If you could eliminate this variable maybe that would help. So, if vTPM is working while the VM is in UEFI (there's a menu there where you can make changes to the TPM config like the choice of active PCR banks) then that would eliminate gluster IMO.

ncstate-daniel commented 3 months ago

I definitely am able to mess with it in the UEFI menu -- I tried that at one point in my desperate search for a solution. ;)

Rebuild the VM just means I'm deleting the VM completely and recreating it. This deletes the previous swtpm files and regenerates them all from scratch. I don't know that there's even any point to doing that -- but as you can probably tell I'm grasping at straws so trying everything. Plus I thought maybe there's a chance a different version of swtpm might build its files slightly differently.

I'll poke around with libtpms some more and see if I come across anything. I also just put selinux into permissive mode to see if that was causing a problem but no... it is not. =/

stefanberger commented 3 months ago

Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:

root@mystic:/home/foreman_re# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600
swtpm: SWTPM_NVRAM_Lock_Dir: Could not open lockfile: Permission denied

It's quite possible that this is due to (incomplete) AppArmor profile on Ubuntu.

ncstate-daniel commented 3 months ago

Aha -- so it would seem: [609320.685646] audit: type=1400 audit(1717082960.702:4389): apparmor="DENIED" operation="open" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=53739 comm="swtpm" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=131

TBH the syntax looks sane to me. That all said -- I'm not sure print-states works exactly the same with 0.7.3 as it idoes 0.8.x. If I put it in complain mode it looks like it's trying to take over the tpm instead of just looking at it:

[610152.692707] audit: type=1400 audit(1717083792.711:4427): apparmor="ALLOWED" operation="open" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=131
[610152.692723] audit: type=1400 audit(1717083792.711:4428): apparmor="ALLOWED" operation="truncate" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="w" denied_mask="w" fsuid=0 ouid=131
[610152.692765] audit: type=1400 audit(1717083792.711:4429): apparmor="ALLOWED" operation="file_lock" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="wk" denied_mask="wk" fsuid=0 ouid=131

It fails saying: swtpm: SWTPM_NVRAM_Lock_Dir: Could not lock access to lockfile: Resource temporarily unavailable

So at least it didn't -really- mess with it.

Anyway that was a bit of a tangent... Can you think of any good reason to downgrade the Rocky 9 box to 0.7.3 like Ubuntu has instead of sticking with 0.8.0? (or 0.8.2 which I temporarily updated to)

ncstate-daniel commented 3 months ago

Very random but -- every time I use wget on this host I'm getting:

ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-tabrmd.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpmrm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tcti-swtpm.c:614:Tss2_Tcti_Swtpm_Init() Cannot connect to swtpm TPM socket
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-swtpm.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-mssim.so.0
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:254:tctildr_get_default() No standard TCTI could be loaded
ERROR:tcti:src/tss2-tcti/tctildr.c:428:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI
ERROR:fapi:src/tss2-fapi/api/Fapi_Initialize.c:227:Fapi_Initialize_Finish() Initializing TCTI. ErrorCode (0x000a000a)

(and then it proceeds to download what I asked for)

That's from the host itself, not the VM.

stefanberger commented 3 months ago

Very random but -- every time I use wget on this host I'm getting:

ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-tabrmd.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpmrm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tcti-swtpm.c:614:Tss2_Tcti_Swtpm_Init() Cannot connect to swtpm TPM socket
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-swtpm.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-mssim.so.0
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:254:tctildr_get_default() No standard TCTI could be loaded
ERROR:tcti:src/tss2-tcti/tctildr.c:428:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI
ERROR:fapi:src/tss2-fapi/api/Fapi_Initialize.c:227:Fapi_Initialize_Finish() Initializing TCTI. ErrorCode (0x000a000a)

Iirc this is output from/due to the Intel TSS2 pkcs11 driver.

ncstate-daniel commented 3 months ago

So probably not related? Ok.

Is there any reason at all I should continue to rebuild the tpm data files when doing this testing or is there practically no way that would be a problem? Like if I just do the equiv of "stop swtpm, replace rpm, start swtpm" should that be enough to debug what i'm doing? it most certainly gets annoying waiting for windows 11 to reinstall =)

stefanberger commented 3 months ago

So probably not related? Ok.

Is there any reason at all I should continue to rebuild the tpm data files when doing this testing or is there practically no way that would be a problem? Like if I just do the equiv of "stop swtpm, replace rpm, start swtpm" should that be enough to debug what i'm doing? it most certainly gets annoying waiting for windows 11 to reinstall =)

Do not reinstall win11 just to test vTPM. You may need to figure out what the uuid is of your VM (virsh dumpxml ) and then remove the swtpm state files once you stopped the VM: rm -rf /var/lib/libvirt/swtpm/<uuid>. Once you restart the VM swtpm_localca will be called and create the initial state. You should see this in the log (/var/log/swtpm/libvirt/qemu/<vmname>-swtpm.log).

ncstate-daniel commented 3 months ago

You're not going to believe this but:

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

It was the 2%! I upgraded openssl from 3.0.7 to fedora 37's 3.0.9 and the problem went away!

Specifically I grabbed: https://rpmfind.net/linux/fedora/linux/updates/37/Everything/x86_64/Packages/o/openssl-3.0.9-1.fc37.x86_64.rpm (and -libs of course)

Thanks so much for pointing me in the right direction! And I hope this little adventure helps you in future reports that might come in about it. =)

stefanberger commented 3 months ago

You're not going to believe this but:

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

It was the 2%! I upgraded openssl from 3.0.7 to fedora 37's 3.0.9 and the problem went away!

Great! So we now know what the problem and solution is. If it's OpenSSL I would tell your distro. Ideally your distro would also update libtpms after the CVE fixes etc. that went in in the meantime.

ncstate-daniel commented 3 months ago

Done: https://bugs.rockylinux.org/view.php?id=6931 I don't know if they'd be willing to upgrade beyond what RHEL9 provides, so I may need to take it up to Red Hat instead, but figured I would start from here and see where things go.

stefanberger commented 3 months ago

Done: https://bugs.rockylinux.org/view.php?id=6931 I don't know if they'd be willing to upgrade beyond what RHEL9 provides, so I may need to take it up to Red Hat instead, but figured I would start from here and see where things go.

You're right, RHEL 9.4 uses openssl-3.0.7-27.el9.x86_64.rpm. Strange but I haven't heard anything about this so far. Maybe they applied some necessary patches 'hidden' in release version '-27'.

ncstate-daniel commented 3 months ago

It's possible not many people do both:

The only real reason I noticed it was because SCCM freaks out about it. =) I imagine if someone had enabled bitlocker they would see it as well but who knows!

ncstate-daniel commented 3 months ago

BTW Rocky Linux asked me to take it upstream so: https://issues.redhat.com/browse/RHEL-39899 (if you wanted to follow it or if anyone else is interested)

ncstate-daniel commented 3 months ago

It appears they have already fixed those CVEs in RHEL 9. As they put it: For CVEs, please always check the CVE pages: https://access.redhat.com/security/cve/CVE-2023-1017 and https://access.redhat.com/security/cve/CVE-2023-1018.

Both are marked as fixed on RHEL 9.

Anyway -- OpenSSL 3.2 is set to be in RHEL 9.5 -- I'm hoping to test that in the near future but I imagine that will also resolve the issue.

stefanberger commented 3 months ago

Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:

root@mystic:/home/foreman_re# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600
swtpm: SWTPM_NVRAM_Lock_Dir: Could not open lockfile: Permission denied

For this to work the following rule is needed by the Ubuntu-maintained AppArmor profile:

# Last Modified: Sat Jun  8 09:28:29 2024
include <tunables/global>

# vim:syntax=apparmor
# AppArmor policy for swtpm
# Author: Lena Voytek <lena.voytek@canonical.com>

[...]
  # allow root to run --print-states
  /var/lib/libvirt/swtpm/** wk,
[...]
}

Though this also only works once the running swtpm has terminated and released the .lock file (this has been fixed in 0.8).

ncstate-daniel commented 2 months ago

Hi @stefanberger ! Turns out there's a far similar solution. In Red Hats words (after a lot of debugging back and forth):

Can you try with 3.0.7 and 3.2.2 with the SHA1 crypto-policy module? I'm guessing swtpm may be requiring a signature with SHA-1, which does not work by default on RHEL 9 and up, but does work by default in the Fedora builds.

Run

update-crypto-policies --set DEFAULT:SHA1 to switch to that.

After making that change. even openssl 3.0.7 works just fine.

stefanberger commented 2 months ago

Hi @stefanberger ! Turns out there's a far similar solution. In Red Hats words (after a lot of debugging back and forth):

Can you try with 3.0.7 and 3.2.2 with the SHA1 crypto-policy module? I'm guessing swtpm may be requiring a signature with SHA-1, which does not work by default on RHEL 9 and up, but does work by default in the Fedora builds. Run update-crypto-policies --set DEFAULT:SHA1 to switch to that.

After making that change. even openssl 3.0.7 works just fine.

I was not aware of this.

Is the sha-1 bank enabled for your VM? You could check by entering the TPM 2 UEFI config menu and look at the active PCR banks. If it is enabled you probably should disable it (even though swtpm_setup would NOT have enabled it since it only enables sha256 bank by default) and then revert this change on your host and try again.

ncstate-daniel commented 2 months ago

Screenshot 2024-06-19 110155

And from swtpm_setup.conf:

# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

Sure doesn't LOOK like sha1 is enabled to me. How odd. (unless of course I'm misreading this)

stefanberger commented 2 months ago

Hm. I don't know what this is. If SHA 1 was being used for TPM2 selftests, which it isn't, the TPM 2 would be in failure mode if it didn't work and you wouldn't have said "Get-TPM, tpm.msc, etc all show a happy healthy TPM chip". So I don't know what is trying to get a signature with SHA-1 here and fails. Maybe Windows is trying it and the TPM doesn't do it?

ncstate-daniel commented 2 months ago

Yeah it's strictly the Get-TpmSupportedFeature call that triggers the error -- what that's asking for I do not know but I'd guess that this is Window's fault at some level. ;) but at least there's a workaround. I DO need to make sure setting that isn't going to interfere with our ability to join our active directory domain, but I doubt it will. I wonder if there's a way to see specifically what is being requested when get-tpmsupportedfeature is run.

stefanberger commented 2 months ago

Since libvirt won't provide logging level parameters to swtpm you have to hook gdb onto the swtpm process and run the following command before you start the command that's causing the failure. I hope you have control over the Win tool that's actually causing the issue so that the request and response become visible in the log:

$ sudo gdb pid 1321582
[...]
(gdb) call (int)log_set_level(10)
$1 = 0
(gdb) c

Now swtpm writes all TPM commands and responses into /var/log/swtpm/libvirt/qemu/<vm name>-swtpm.log.

You can then run call (int)log_set_level(0) to turn off the logging again.

PS: you may have to install debuginfo packages: sudo dnf debuginfo-install swtpm libtpms.

ncstate-daniel commented 2 months ago

swtpm_log_win11_tpm.txt

Here is the log of what happened when I ran: Get-TpmSupportedFeature

stefanberger commented 2 months ago

Looking for a failure on the responses I found this command/response pair here that indicates a failure (it continued afterwards without failures):

 SWTPM_IO_Read: length 225
 80 02 00 00 00 E1 00 00 01 4A 80 00 00 02 80 00
 00 02 00 00 00 49 02 00 00 00 00 20 47 B6 8C 4E
 73 6D 5B EB 8E FD C8 2F 28 CF FA 90 B3 62 F9 E9
 55 5A 8D F2 5D 64 E7 25 8A 82 A9 C7 00 00 20 54
 1D 28 84 3A 59 71 B3 DC 9C E9 DC B4 44 2B 0F 09
 A9 38 DD 80 7E B9 AD 0D 06 98 E7 22 75 F7 81 00
 14 3C 86 1B 8F BA 1E 7D 9E B3 63 00 66 E8 A4 EB
 6E C3 03 D5 7B 00 20 02 B3 D6 FC 4F 8B 2C A5 E5
 EF BE 23 88 B4 9E 85 1A 7A 69 7E 07 20 FF 73 89
 BD 7A 6E 8E 3A AC AD 00 10 80 21 40 00 00 01 00
 40 C2 42 6B 94 F4 7C FB DB A7 13 CD 9A 56 5E 5C
 F1 78 0E 0D 86 FC 94 5C 1B 8A F2 B3 DF AC 0F 9F
 27 10 41 16 13 92 14 A2 6B 2F 0F BF 0D B8 78 76
 FB 7F 04 38 17 93 1C CA 6E 26 A7 CB 59 82 B7 49
 D1
 SWTPM_IO_Write: length 10
 80 01 00 00 00 0A 00 00 01 01

The command that failed is TPM_CC_CertifyCreation with failure 0x101:

$ tssreturncode 0x101
TPM_RC_FAILURE - commands not being accepted because of a TPM failure
 80 02 00 00 00 E1 00 00 01 4A                      header: CC = TPM_CC_CertifyCreation
 80 00 00 02                                        handle of previously loaded key
 80 00 00 02                                        handle of previously loaded key

 00 00 00 49                                        authSize 

 02 00 00 00                                        sessionHandle

 00 20 47 B6 8C 4E                                  32-byte TPM2B buffer -- nonce?
 73 6D 5B EB 8E FD C8 2F 28 CF FA 90 B3 62 F9 E9
 55 5A 8D F2 5D 64 E7 25 8A 82 A9 C7 

 00                                                 sessionAttributes

 00 20 54                                           32-byte TPM2B buffer -- hmac?
 1D 28 84 3A 59 71 B3 DC 9C E9 DC B4 44 2B 0F 09
 A9 38 DD 80 7E B9 AD 0D 06 98 E7 22 75 F7 81

 00                                                 20-byte TPM2B_DATA qualifyingHash
 14 3C 86 1B 8F BA 1E 7D 9E B3 63 00 66 E8 A4 EB
 6E C3 03 D5 7B 

 00 20 02 B3 D6 FC 4F 8B 2C A5 E5                   32-byte TPM2B_DIGEST creationHash
 EF BE 23 88 B4 9E 85 1A 7A 69 7E 07 20 FF 73 89
 BD 7A 6E 8E 3A AC AD 

 00 10                                              TPM_ALG_NULL       -- inScheme

 80 21                                              TPM_ST_CREATION    -- start of creationTicket

 40 00 00 01                                        TPM_RH_OWNER (hierarchy)
 00                                                 64-byte TPM2B_DIGEST
 40 C2 42 6B 94 F4 7C FB DB A7 13 CD 9A 56 5E 5C
 F1 78 0E 0D 86 FC 94 5C 1B 8A F2 B3 DF AC 0F 9F
 27 10 41 16 13 92 14 A2 6B 2F 0F BF 0D B8 78 76
 FB 7F 04 38 17 93 1C CA 6E 26 A7 CB 59 82 B7 49
 D1

It's not clear to me why it is failing. You could try to set a break point on TPM2_CertifyCreation and single-step through it.

ncstate-daniel commented 2 months ago

It's been .. probably over a decade since I did any serious debugging with gdb so I'm a tad rusty. So I'm attaching here what I saw in case this means anything to you. After the last line it continued on without intervention. swtpm_log_trace.txt

stefanberger commented 2 months ago

Thanks for the log but there's nothing in there that would indicate a failure let alone the reason for the failure...

ncstate-daniel commented 2 months ago

Hrm that's unfortunate. Well -- I mean from my perspective I'm "happy enough" with the solution. I'm happy to help you debug it more if you really want to, since I have an active way to test it, but if you'd rather let it be I'm ok with that too since I have a viable workaround. What do you think?

stefanberger commented 2 months ago

Of course I would be curious what is causing the failure but you would have to step through the code, possibly multiple times, with an eye on the value returned from the function or the functions it calls, especially the ones further down the call stack doing the RSA or ECDSA signature.

[I would first set a breakpoint in this function again and then set breakpoints after function calls and at the/each return statement. Once you are in this function you can use list to show the function and break <line number> to set a breakpoint at a specific line. And then display $ret or print $ret could show you what the value of ret is, unless it has been optimized out. ]

Though for now we know that enabling SHA1 signing for OpenSSL on the host resolves the issue. It seems to be related to RSA/ECDSA signing with SHA1 that OpenSSL refuses to do due to the policy (sha1 signing not being allowed anymore for some time now) - so in a way we already know where it's coming from. It would be unfortunate that Microsoft hasn't updated the tool and switched it to SHA256 signing.

ncstate-daniel commented 2 months ago

Ok -- well probably tomorrow I'll dive into that a bit more and see what I can figure out!

stefanberger commented 2 months ago

Ok -- well probably tomorrow I'll dive into that a bit more and see what I can figure out!

Once you have entered TPM2_CertifyCreation you could set breakpoints at CryptRsaSign and CryptEccSign and issue continue and it will probably then stop next in either one of them and you would single step through those with an eye on return values...

ncstate-daniel commented 2 months ago

Hey -- I haven't made much of any headway poking around in gdb so far. One interesting thing that occurred, however, is that Windows started giving me slightly more interesting information. I'm not real sure what triggered it -- but it says that the TPM failed to execute a command, for the following two ordinals: 378 374

Not sure if that's helpful in any way -- and for all i know that's just the result of gdb holding up the response while i poke around. But I wanted to share it in case there's any aha type of moment.

The only slightly interesting but probably not interesting at all thing I've come across so far is that hashAlg is set to 13.

For some reason gdb seems to get to a point where, despite me 'step'ing forward, it suddenly lurches forward to continue. Seemingly out of nowhere.

ncstate-daniel commented 2 months ago

Oh and neither of these break points ever caught: CryptRsaSign and CryptEccSign That said: (gdb) break CryptRsaSign Function "CryptRsaSign" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 2 (CryptRsaSign) pending. (gdb) break CryptEccSign Function "CryptEccSign" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 3 (CryptEccSign) pending.

stefanberger commented 2 months ago

The TPM commands may time out while you single step through them and the application oR driver/OS may report this.

ncstate-daniel commented 2 months ago

Would it be helpful in any way for me to capture the same log level 10 output from a -successful- run with SHA1 enabled and upload that?

stefanberger commented 2 months ago

Would it be helpful in any way for me to capture the same log level 10 output from a -successful- run with SHA1 enabled and upload that?

Sure, you could do that. I would not expect any failures in such a log.

stefanberger commented 2 months ago

CryptRsaSign and CrytptEccSign should be correct. Otherwise maybe add a breakpoint in CryptSign also.

grep -r -E "(CryptRsaSign|CryptEccSign)" src/tpm2/crypto/openssl/
src/tpm2/crypto/openssl/consttime.txt:CryptEccSignature.c:
src/tpm2/crypto/openssl/CryptEccSignature.c:/*            $Id: CryptEccSignature.c 1658 2021-01-22 23:14:01Z kgoldman $      */
src/tpm2/crypto/openssl/CryptEccSignature.c:/* 10.2.12 CryptEccSignature.c */
src/tpm2/crypto/openssl/CryptEccSignature.c:#include "CryptEccSignature_fp.h"
src/tpm2/crypto/openssl/CryptEccSignature.c:/* 10.2.12.3.6 CryptEccSign() */
src/tpm2/crypto/openssl/CryptEccSignature.c:CryptEccSign(
src/tpm2/crypto/openssl/CryptRsa.c:/* 10.2.17.4.17 CryptRsaSign() */
src/tpm2/crypto/openssl/CryptRsa.c:CryptRsaSign(
src/tpm2/crypto/openssl/CryptRsa.c:CryptRsaSign(
ncstate-daniel commented 2 months ago

One thing I'm noticing right off the bat is that it's FAR longer.

swtpm_log_win11_working_tpm.txt