Open cgwalters opened 1 year ago
100%, that is a better design choice overall but wont that be some time away for most products? IMO is possible that some systems might want to chose a more conservative approach when it comes to security and completely shutdown no matter what any other process might be running: I mean, the system is compromised already....and then before kernel reboot/shutdown, something could be logged (maybe attach some form of kernel notifier) so that persistent storage (RPMB?) can be updated to flag the situation during reboot... I am thinking that perhaps a router would fit in that sort of product but I dont know for sure
I think you will have a hard time selling it upstream.
I think you will have a hard time selling it upstream.
yes I fully agree as well: but I think it is the sort of patch worth carrying off-tree
Well, that is up to whoever wants to carry it. I'm not very interested in that kind of thing though.
Hey, I'm wondering what's the current state of file verification? It's a bit hard to process all relevant threads as a project outsider.
In particular I'm trying to figure out whether IMA is working or not (#3240), or is it supposed to be replaced with composefs?
Most engineering effort is on composefs/erofs/fs-verity right now
one note, if using an old systemd (ie, 250 (250.5+) with systemd-boot and ostree+composefs, you might need this systemd patch to find out the boot/ EFI partition:
Subject: [PATCH] gpt: composefs: block device on sysroot
rootfs corresponds to the composefs overlay: use sysroot instead.
Signed-off-by: Jorge Ramirez-Ortiz <jorge@foundries.io>
---
src/gpt-auto-generator/gpt-auto-generator.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gpt-auto-generator/gpt-auto-generator.c b/src/gpt-auto-generator/gpt-auto-generator.c
index 64ca9bb2f9..a7c564ca5c 100644
--- a/src/gpt-auto-generator/gpt-auto-generator.c
+++ b/src/gpt-auto-generator/gpt-auto-generator.c
@@ -774,7 +774,7 @@ static int add_mounts(void) {
* here. */
r = readlink_malloc("/run/systemd/volatile-root", &p);
if (r == -ENOENT) { /* volatile-root not found */
- r = get_block_device_harder("/", &devno);
+ r = get_block_device_harder("/sysroot", &devno);
if (r == -EUCLEAN)
return btrfs_log_dev_root(LOG_ERR, r, "root file system");
if (r < 0)
--
2.34.1
otherwise boot will not be mounted
@ldts Hmm, yes this relates to https://github.com/containers/composefs/issues/280 as well as https://github.com/ostreedev/ostree/issues/3193
@ldts Hmm, yes this relates to containers/composefs#280 as well as #3193
the gpt generator on systemd-boot 250 looks for the block device (expecting it contains the boot partition to mount besides the rootfs partition) using "/" (which with composefs is actually the overlay). So switching it to sysroot seems a better choice when using ostree. I hit this the other day so this is why I thought it would be worth sharing it here.
without it, the system would still boot, but ostree admin status
would fail to find /boot info
Well, that is up to whoever wants to carry it. I'm not very interested in that kind of thing though.
what about adding debugfs/sysfs counters on those errors?
Well, that is up to whoever wants to carry it. I'm not very interested in that kind of thing though.
what about adding debugfs/sysfs counters on those errors?
I'll propose something upstream (unless someone beats me to it)
@cgwalters are there any performance tests being run comparing ostree vs ostree+composefs+fsverity?
I am simply running: FIO: https://fio.readthedocs.io/en/latest/fio_doc.html
fio --name=test --readonly --filename=/usr/bin/openssl --size=1M
on:
root@intel-corei7-64:/var/rootdirs/home/fio# findmnt /
TARGET SOURCE FSTYPE OPTIONS
/ /dev/disk/by-label/otaroot[/ostree/deploy/lmp/deploy/a79146c3c72ad3cf8a652a711c0d705f68a49e81eb2b0db52c079d5ce3577751.0] ext4 rw,relatime
resulting in:
Run status group 0 (all jobs):
READ: bw=229MiB/s (240MB/s), 229MiB/s-229MiB/s (240MB/s-240MB/s), io=936KiB (958kB), run=4-4msec
and on
root@intel-corei7-64:/var/rootdirs/home/fio# findmnt /
TARGET SOURCE FSTYPE OPTIONS
/ none overlay ro,relatime,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on,verity=require
resulting in
Run status group 0 (all jobs):
READ: bw=41.5MiB/s (43.6MB/s), 41.5MiB/s-41.5MiB/s (43.6MB/s-43.6MB/s), io=936KiB (958kB), run=22-22msec
I still need to check what this test is doing but I was wondering if there are any performance tests that I could use before we start deploying to embedded devices.
[apologies for the removal of previous threads but I am just learning about the tool ...so was commenting in case anyone could steer/chip in]
With something like this (10 minutes random read buffered workload) I persistently measure ~4% performance degradation on CFS reads. Does this seem correct?
#!/bin/bash
ext4="/sysroot/ostree/deploy/lmp/deploy/34da1439c2e5a9f6d57b5b48b827763f9d3d48b9f6cecd093669f61aa99b803c.0/usr/bin/"
cfsf="/usr/bin"
echo "CFS: "
cfsf_val=`sudo fio --opendir=$cfsf --direct=0 --rw=randread --bs=4k --ioengine=sync --iodepth=256 --runtime=600 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly | grep READ`
echo " ==> $cfsf_val"
echo""
echo "EXT: "
ext4_val=`sudo fio --opendir=$ext4 --direct=0 --rw=randread --bs=4k --ioengine=sync --iodepth=256 --runtime=600 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly | grep READ`
echo " ==> $ext4_val"
CFS:
==> READ: bw=359MiB/s (376MB/s), 359MiB/s-359MiB/s (376MB/s-376MB/s), io=210GiB (226GB), run=600001-600001msec
EXT:
==> READ: bw=371MiB/s (389MB/s), 371MiB/s-371MiB/s (389MB/s-389MB/s), io=218GiB (234GB), run=600001-600001msec
@ldts I haven't measured or anything but a 4% degradation seems reasonable with fs-verity on as it has to verify bytes as it reads.
What would be even more interesting would be, ext4 vs composefs with fs-verity on vs composefs with fs-verity off.
I expect some Desktop users would prefer fs-verity off for example so they can make some local atomic changes and maybe care less about local signatures (I dunno up for debate) and maybe don't care about fs-verity.
But in IoT or Automotive or somewhere like that it would make more sense to have fs-verity on.
With fs-verity off, I would expect composefs to be faster than ext4 as it is erofs backed, so it would be interesting to see that.
Maybe these things belong here also:
@ericcurtin ok I'll measure fs-verity off as well. makes sense. thanks for the info.!
With fs-verity off, I would expect composefs to be faster than ext4 as it is erofs backed, so it would be interesting to see that.
Note that EROFS can have some impacts on metadata access only since ostree keeps data in the underlay filesystem. If your fio workload mainly measures full-data rand/seq read access it will have minor impacts tho.
@ericcurtin ok I'll measure fs-verity off as well. makes sense. thanks for the info.!
sorry I mispoke earlier about the 4% loss (my bad, wasnt measuring the right filesystem since fsverity was enabled everywhere);
I am still doing some benchmarking (just qemu x86_64 based) but what I see with randomized reads using a buffered syncrhonous API for the tests on a full system install:
kernel:
Linux intel-corei7-64 6.6.25-lmp-standard #1 SMP PREEMPT_DYNAMIC Thu Apr 4 18:23:07 UTC 2024 x86_64 GNU/Linux
command uses fio 3.30:
fio
--opendir=/usr/bin # directory to use for reads
--direct=0/1 # buffered/direct file access (0/1): test both
--rw=randread
--bs=4k
--ioengine=sync/libaio # sync/asynch api : test both
--iodepth=256 # number of i/o units in flight against the file
--runtime=3600 # run for an hour
--numjobs=4 # four threads
--time_based
--group_reporting # report the group instead of individual threads
--name=iops-test-job
--readonly # do not write
--aux-path=~/ # use the home directory for temp files
Test
**iops-test-job: (g=0): **rw=randread**, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, **ioengine=sync**, iodepth=256**
1) composefs + signed image + full integrity on rootfs: rw bandwidth ~300MB/sec
TARGET SOURCE FSTYPE OPTIONS
/ overlay overlay ro,relatime,lowerdir=/run/ostree/.private/cfsroot-lower::/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on,verity=require
2) composefs : ~720MB/s
TARGET SOURCE FSTYPE OPTIONS
/ overlay overlay ro,relatime,lowerdir=/run/ostree/.private/cfsroot-lower::/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
3) no composefs: ~670MB**/sec
TARGET SOURCE FSTYPE OPTIONS
/ /dev/disk/by-label/otaroot[/ostree/deploy/lmp/deploy/d8d86dea56476c576f743097ce3a9dbee1927116893596725b39ac4bf5b17bdb.0] ext4 rw,relatime
I will stop here and comment further once all the benchmarking is done (I felt it was best if I corrected my earlier comment)
um, while testing I noticed that I cant enable image signatures without also enabling fsverity on all the files in the rootfs. Is this expected?
@ldts makes sense to me, fs-verity is what checks the signatures. Doesn't seem very useful to have signatures without fs-verity.
@ldts makes sense to me, fs-verity is what checks the signatures. Doesn't seem very useful to have signatures without fs-verity.
Arent we talking about two different things? the fs-verity kernel layer just checks any file measurements if enabled -and supported- in the filesystem. However verifying an async ed25519 signature is a different unrelated thing. I am wondering why are they linked together,
In my use case - userspace signature validation- the public key is not loaded in the kernel key ring but just used by ostree-prepare-root to validate the composefs image signature. Other than that, I dont see why fs-verity needs to depend on it?
to me it seems like a bug against usespace signature validation- an assumption that the kernel keyring must contain the key.
said differently, why composefs image signature with fs-verity rootfs integrity is not supported - is it intentional or an implementation issue?
The current release only supports composefs image signature with fs-verity rootfs integrity&authentication
I see what you mean now, yeah I guess both could be supported individually.
also, full filesystem integrity/authentication is amazing, no doubts about it. But perhaps many embedded devices wont be able to afford it - the performance drop in read-bandwidth testing can be too noticeable. So (an extension/feature?) maybe fs-verity could be allowed to be enabled on some of the deploy folders instead of requiring it on all of them? which is what I was trying to test when I noticed it wouldn't work.
[I havent tested this patch yet, maybe it makes a good enough difference] https://patches.linaro.org/project/linux-crypto/patch/20240507002343.239552-7-ebiggers@kernel.org/
Incidentanlly on imx8mp we are seeing a 13% improvement in read bandwidth performance tests by using ostree with CFS (without fs-verity) over EXT4. So really neat.
I feel I am polluting this thread - maybe I should open a performance evaluation issue?
composefs/ostree (and beyond)
Background
A key design goal of ostree at its creation was to not require any new functionality in the Linux kernel. The baseline mechanisms of hard links and read-only bind mounts suffice to manage views of read-only filesystem trees.
However, for Docker and then podman,
overlayfs
was created to more efficiently support copy-on-write semantics - also crucially, overlayfs is a layered filesystem; it can work with any underlying (modern) Linux filesystem as a backend.More recently, composefs was created which builds on overlayfs with more integrity features. This tracking issue is for the integration of composefs and ostree.
System integrity
ostree does not provide significant support for truly immutable system state; a simple
mount -o remount,rw /usr
will allow direct persistent modification of the underlying files.There is
ostree fsck
, but this is inefficient and manual, and further still today does not cover the checked-out deployment roots (so e.g. newly added binaries in the deployment root aren't found).Accidental damage protection
It is important to ostree to support "user owns machine" scenarios, where the user is root on their own computer and must have the ability to make persistent changes.
But it's still useful to have stronger protection against accidental damage. Due to the way composefs works using fs-verity, a simple
mount -o remount,rw
can no longer silently modify files. First, the mountedcomposefs
is always read-only; there is no write support in composefs. Access to the distinct underlying persistent root filesystem can be more strongly separated and isolated.Support for "sealed" systems
It's however also desirable to support a scenario where an organization wants to produce computing devices that are "sealed" to run only code produced (or signed) by that organization. These devices should not support persistent unsigned code.
ostree does not have strong support for this model today, and composefs should fix it.
Phase 0: Basic integration (experimental)
In this phase, we will land an outstanding pull request which adds basic integration that enables booting a system using composefs as a root filesystem. In this phase, a composefs image is dynamically created on the client using the ostree metadata.
This has already led us to multiple systems integration issues. So far, all tractable.
A good milestone to mark completion of this phase is landing a CI configuration to ostree which builds and deploys a system using composefs, and verifies it can be upgraded.
In this phase, there is no direct claimed support for "sealed" systems (i.e. files are not necessarily signed).
Phase 1: Basic rootfs sealing (experimental)
In this phase, support for signatures covering the composefs is added. A key question to determine is when the composefs file format is stable. Because the PR up until this point defaults to "re-synthesizing" the composefs on the client, the client must reproduce exactly what was generated server side and signed.
Phase 2: Secure Boot chaining (experimental)
This phase will document how to create a complete system using Secure Boot which chains to a root filesystem signature using composefs.
This may also depend on https://github.com/ostreedev/ostree/issues/2753 and https://github.com/ostreedev/ostree/issues/1951
Here is a sketch for how we can support trusted boot using composefs and fs-verity signatures.
During build:
/etc/pki/fsverity/cfs.pub
)--install /etc/pki/fsverity/cfs.pub
to dracut, which will copy the public key into the initrd.ostree=latest
argument, because at this point we don't know the final deployment id. See also discussion in https://github.com/ostreedev/ostree/pull/2844ostree commit
) and generate a composefs image file based on the rootdir digest. We sign this file with the private key and store the signature as extra metadata in the commit object.During install:
ostree=...
arg.During boot:
/ostree/deploy/fedora-coreos/deploy/443ae0cd86a7dd4c6f5486a2283471b3c8f76fc5dcc4766cf935faa24a9e3d34.0
). (Note at this point that we can't trust either the BLS file or the deploy dir.)LCFS_MOUNT_FLAGS_REQUIRE_SIGNATURE
flag. This ensures that the file to be mounted has a signature, and thus can only be read if the matching public key is loaded in the keyring.Beyond
At this point, we should have gained significant experience with the system. We will determine when to mark this as officially stabilized after this.
Phase 3: "Native composefs"
Instead of "ostree using composefs", this proposes to flip things around, such that more code lives underneath the "composefs" project. A simple strawman proposal here is that we have the equivalent of
ostree-prepare-root.service
actually becomposefs-prepare-root.service
and live in github.com/containers/composefs.Related issues:
Phase 4: Unified container and host systems
This phase builds on the native composefs for hosts and ensures that containers (e.g. podman) share backing storage with the host system and as much code as possible.