stefanberger / swtpm

Libtpms-based TPM emulator with socket, character device, and Linux CUSE interface.
Other
574 stars 140 forks source link

Remove dependency on Python? #437

Closed andreabolognani closed 3 years ago

andreabolognani commented 3 years ago

Describe the bug

swtpm-tools drags in a big list of dependencies, notably Python and a number of Python modules.

To Reproduce

Steps to reproduce the behavior:

  1. [host]$ docker pull fedora:34
  2. [host]$ docker run --rm -it fedora:34
  3. [container]$ dnf update --refresh -y
  4. [container]$ dnf install swtpm-tools

Expected behavior

swtpm-tools drags in significantly fewer dependencies, and specifically no Python :)

Desktop

Versions of relevant components

Additional context

This was originally brought up in the context of KubeVirt, more specifically https://github.com/kubevirt/kubevirt/pull/5588#issuecomment-834159876 and follow-up comments.

KubeVirt is trying to minimize the size of its container image, and so far that has involved reducing the number of libvirt components that are included; however, the libvirt QEMU driver needs swtpm-tools to work, and since swtpm_setup is written in Python that causes the interpreter to be dragged in as well.

Note that KubeVirt's container images are build from Fedora RPMs, but do not use the standard Fedora container images as base: instead, the RPMs are unpacked by bazeldnf on top of a FROM scratch container, and since dnf is not included the only package causing Python to be added to the image is indeed swtpm.

I notice that most of swtpm is written in C, with only a few parts implemented in Python. Do you think it would be feasible to remove the dependency on Python by rewriting those few part in C? Or perhaps another compiled programming language such as Rust?

Alternatively, do you think it would be viable to rewrite libvirt's swtpm support so that it calls into libswtpm directly instead of invoking the command line tools, or would that result in too much duplication? I know you're the one who's implemented swtpm support in libvirt in the first place, so you probably have an opinion on the topic :)

Thank you in advance for your help!

# dnf install swtpm-tools
Last metadata expiration check: 0:03:33 ago on Fri May  7 12:10:34 2021.
Dependencies resolved.
============================================================================
 Package                       Version                                 Size
============================================================================
Installing:
 swtpm-tools                   0.5.2-2.20201226gite59c0c1.fc34        120 k
Installing dependencies:
 acl                           2.3.1-1.fc34                            71 k
 checkpolicy                   3.2-1.fc34                             344 k
 cryptsetup-libs               2.3.5-2.fc34                           479 k
 dbus                          1:1.12.20-3.fc34                       8.0 k
 dbus-broker                   28-3.fc34                              174 k
 dbus-common                   1:1.12.20-3.fc34                        15 k
 device-mapper                 1.02.175-1.fc34                        144 k
 device-mapper-libs            1.02.175-1.fc34                        179 k
 diffutils                     3.7-8.fc34                             391 k
 gnutls-dane                   3.7.1-2.fc34                            32 k
 gnutls-utils                  3.7.1-2.fc34                           483 k
 iptables-legacy-libs          1.8.7-7.fc34                            40 k
 kmod-libs                     28-2.fc34                               64 k
 libargon2                     20171227-6.fc34                         29 k
 libevent                      2.1.12-3.fc34                          268 k
 libibverbs                    34.0-3.fc34                            339 k
 libnl3                        3.5.0-6.fc34                           324 k
 libpcap                       14:1.10.0-1.fc34                       176 k
 libseccomp                    2.5.0-4.fc34                            71 k
 libselinux-utils              3.2-1.fc34                             160 k
 libtpms                       0.8.2-0.20210301git729fc6a4ca.fc34.1   392 k
 policycoreutils               3.2-1.fc34                             206 k
 policycoreutils-python-utils  3.2-1.fc34                              71 k
 protobuf-c                    1.3.3-7.fc34                            35 k
 python3-audit                 3.0.1-2.fc34                            86 k
 python3-cffi                  1.14.5-1.fc34                          244 k
 python3-cryptography          3.4.6-1.fc34                           1.4 M
 python3-libselinux            3.2-1.fc34                             187 k
 python3-libsemanage           3.2-1.fc34                              83 k
 python3-ply                   3.11-11.fc34                           103 k
 python3-policycoreutils       3.2-1.fc34                             2.0 M
 python3-pycparser             2.20-3.fc34                            126 k
 python3-setools               4.4.0-1.fc34                           554 k
 python3-six                   1.15.0-5.fc34                           36 k
 rpm-plugin-selinux            4.16.1.3-1.fc34                         22 k
 selinux-policy                34.5-1.fc34                             83 k
 selinux-policy-targeted       34.5-1.fc34                            6.3 M
 swtpm                         0.5.2-2.20201226gite59c0c1.fc34         41 k
 swtpm-libs                    0.5.2-2.20201226gite59c0c1.fc34         45 k
 systemd                       248-2.fc34                             4.4 M
 systemd-pam                   248-2.fc34                             322 k
 systemd-rpm-macros            248-2.fc34                              28 k
 trousers                      0.3.15-2.fc34                          146 k
 trousers-lib                  0.3.15-2.fc34                          170 k
 unbound-libs                  1.13.1-1.fc34                          528 k
 xkeyboard-config              2.32-3.fc34                            789 k
Installing weak dependencies:
 libxkbcommon                  1.3.0-1.fc34                           144 k
 qrencode-libs                 4.0.2-7.fc34                            60 k
 systemd-networkd              248-2.fc34                             480 k

Transaction Summary
============================================================================
Install  50 Packages

Total download size: 23 M
Installed size: 75 M
Is this ok [y/N]: 
stefanberger commented 3 years ago

I notice that most of swtpm is written in C, with only a few parts implemented in Python. Do you think it would be feasible to remove the dependency on Python by rewriting those few part in C? Or perhaps another compiled programming language such as Rust?

I am open to PRs for this... but definitely prefer to keep it the way it is.

Alternatively, do you think it would be viable to rewrite libvirt's swtpm support so that it calls into libswtpm directly instead of invoking the command line tools, or would that result in too much duplication? I know you're the one who's implemented swtpm support in libvirt in the first place, so you probably have an opinion on the topic :)

I am not sure what it would do when it calls into libswtpm. Calling out and starting command line tools is quite fine in my opinion. Not everything has to be linked together into one big large executable.

stefanberger commented 3 years ago

If someone decides to send a PR regarding this I think this would have to be coordinated. Any interaction of swtpm_setup would have to done in such a way that no other tool can talk to the swtpm at the same time as swtpm_setup does. This is currently solved with Unix sockets. TCP socket communication is not an option. I am not sure whether the existing TSS stack support this type of communication.

stefanberger commented 3 years ago

Btw, swtpm-tools has the following dependencies"

Requires:       swtpm = %{version}-%{release}
Requires:       trousers >= 0.3.9 bash gnutls-utils python3 python3-cryptography

Trousers may go away at some point. What you are showing above is also pulling in swtpm and swtpm-libs and their dependencies, libseccomp for example.

Here are the dependencies of python3-cryptography, which are not all that many direct ones, notably python-six and python-cffi.

Requires

    libc.so.6()(64bit)
    libc.so.6(GLIBC_2.14)(64bit)
    libc.so.6(GLIBC_2.2.5)(64bit)
    libc.so.6(GLIBC_2.4)(64bit)
    libcrypto.so.1.1()(64bit)
    libcrypto.so.1.1(OPENSSL_1_1_0)(64bit)
    libcrypto.so.1.1(OPENSSL_1_1_0j)(64bit)
    libcrypto.so.1.1(OPENSSL_1_1_1)(64bit)
    libssl.so.1.1()(64bit)
    libssl.so.1.1(OPENSSL_1_1_0)(64bit)
    libssl.so.1.1(OPENSSL_1_1_1)(64bit)
    openssl-libs
    python(abi) = 3.9
    python3-cffi >= 1.7
    python3-six >= 1.4.1
    python3.9dist(cffi) >= 1.12
    rpmlib(CompressedFileNames) <= 3.0.4-1
    rpmlib(FileDigests) <= 4.6.0-1
    rpmlib(PartialHardlinkSets) <= 4.0.4-1
    rpmlib(PayloadFilesHavePrefix) <= 4.0-1
    rpmlib(PayloadIsZstd) <= 5.4.18-1
    rtld(GNU_HASH) 

https://rpmfind.net/linux/RPM/fedora/34/x86_64/p/python3-cryptography-3.4.6-1.fc34.x86_64.html

andreabolognani commented 3 years ago

Let me come right out and say that I have basically no understanding of what swtpm does, beyond the broadest of strokes :) Hopefully you're going to keep that in mind and be patient with me :)

I agree that calling external commands in general is a perfectly fine approach - that's what Unix is all about! The issue in this case is simply that one of these commands happens to be implemented in Python, and so calling it requires having a Python interpreter available: while you can usually rely on that being part of the base OS on RHEL-based distros, that's not necessarily the case in the context of e.g. Debian, Alpine or - and this is the one that's more relevant to my interests - KubeVirt's virt-launcher image.

In all those cases, installing swtpm-tools means also installing the Python interpreter and a bunch of Python libraries. I understand the direct dependencies are quite reasonable, but in practice the end result is (most likely a worst version of) what you see above.

My query about using libswtpm directly from the libvirt QEMU driver was based on the assumption that swtpm_setup is basically doing that already - providing a nice user-facing wrapper on top of libswtpm. Was this assumption entirely off-base? If so, apologies for even mentioning this as a potentially viable approach!

stefanberger commented 3 years ago

My query about using libswtpm directly from the libvirt QEMU driver was based on the assumption that swtpm_setup is basically doing that already - providing a nice user-facing wrapper on top of libswtpm. Was this assumption entirely off-base? If so, apologies for even mentioning this as a potentially viable approach!

What is libswtpm for you? These here are private libraries of swtpm. We also have libtpms, which provides the TPM functionality.

/usr/lib64/swtpm/libswtpm_libtpms.so.0
/usr/lib64/swtpm/libswtpm_libtpms.so.0.0.0

I really don't know what you mean with calling libswtpm. swtpm_setup is a python tool that performs manufacturing steps (like creating an endorsement key and writing certificates into its NVRAM) on a swtpm instance by talking to it directly via unix sockets and sending a sequence of TPM commands to it following parameters provided by the user. It should run once per instance but doesn't have to run if one can live without certificates inside the swtpm and whatelse it can setup for you.

The target of swtpm was integration into QEMU and libvirt with them themselves pulling in tons of packages. So, I am not sure what environment you are providing via alpine or so and what else is running there.

andreabolognani commented 3 years ago

What is libswtpm for you? These here are private libraries of swtpm. We also have libtpms, which provides the TPM functionality.

/usr/lib64/swtpm/libswtpm_libtpms.so.0
/usr/lib64/swtpm/libswtpm_libtpms.so.0.0.0

I really don't know what you mean with calling libswtpm.

My assumption was that the main functionality was implemented in a libswtpm shared library, and that swtpm and swtpm_setup were simply convenient user-facing interfaces for the library, in the same way virsh is a reference client for the public API of libvirt. I see now that this assumption was mistaken, and that the library is basically just an implementation detail and can't be used on its own. Apologies for the confusion.

swtpm_setup is a python tool that performs manufacturing steps (like creating an endorsement key and writing certificates into its NVRAM) on a swtpm instance by talking to it directly via unix sockets and sending a sequence of TPM commands to it following parameters provided by the user. It should run once per instance but doesn't have to run if one can live without certificates inside the swtpm and whatelse it can setup for you.

AFAICT libvirt runs swtpm_setup unless the swtpm storage already exists, which I assume is not going to be the case for most VMs (at least until their second boot). So for libvirt users swtpm_setup is a hard dependency - something that's reflected in libvirt's spec file.

The target of swtpm was integration into QEMU and libvirt with them themselves pulling in tons of packages. So, I am not sure what environment you are providing via alpine or so and what else is running there.

It is true that both libvirt and QEMU have a long list of dependencies, but Python is not among them.

In the case of KubeVirt specifically, the environment is built from Fedora RPMs but stripping out everything that's not strictly required to run libvirt VMs: before libvirt 7.0.0, the spec file (incorrectly) didn't specify a dependency on swtpm-tools, and so Python was not included in the environment, but now that this mistake has been fixed the list of dependencies has grown to include the runtime for that language too. There's an ongoing effort to minimize the disk footprint for the container images used by KubeVirt, and adding Python into the mix goes against that effort.

It would be fantastic if we could find a way to avoid the dependency on Python while still retaining swtpm functionality: rewriting the Python part in C is sort of the obvious solution, but I understand it's also one that would require a lot of work. Do you think there is another way we can keep the size of the container image down?

Thanks!

stefanberger commented 3 years ago

It would be fantastic if we could find a way to avoid the dependency on Python while still retaining swtpm functionality: rewriting the Python part in C is sort of the obvious solution, but I understand it's also one that would require a lot of work. Do you think there is another way we can keep the size of the container image down?

I do not know of another way to keep the container size down. It's also not just one tool to rewrite but two. Both have only been rewritten last year from bash scripts. The other one is in the samples directory.

rmohr commented 3 years ago

It would be fantastic if we could find a way to avoid the dependency on Python while still retaining swtpm functionality: rewriting the Python part in C is sort of the obvious solution, but I understand it's also one that would require a lot of work. Do you think there is another way we can keep the size of the container image down?

What I wonder (I am not so familiar with swtpm either), what the implications of swtpm_setup regarding to security are. For the background, the container where we run libvirt/qemu tries to have as little privileges as possible. Another component outside of the continer is doing security sensitive operations on the container with qemu. @stefanberger does swtpm_setup critical stuff which only root or a dedicated privileged account should have access to?

I am just asking to clarify if a rewrite would for kubevirt make sense. Because if this is security sensitive, we must not execute swtpm_setup in the qemu container, and then we may as well just exclude the package and a rewrite may not solve our use-case.

stefanberger commented 3 years ago

swtpm_setup doesn't necessarily need root privileges.

stefanberger commented 3 years ago

There's now some work going on over here regarding this. It will take time and help would be appreciated. I intend a close to 1:1 translation of the code from python to c as it is possible. I am force-updating as I extend the code.

stefanberger commented 3 years ago

There's no more python runtime dependency after the recent merges.

rmohr commented 3 years ago

@stefanberger thanks a lot. That was fast.

:+1:

andreabolognani commented 3 years ago

Indeed! Absolutely impressive.

Thanks a lot from my side too, for helping the KubeVirt project in a very measurable way :)

stefanberger commented 3 years ago

Great.