Open achilleas-k opened 3 months ago
Aside but an important one for me: It'd be really, really, really, really nice if we can share code with Anaconda too. All of these things need to be supported there too.
For toplevel mountpoints, the expectation is that the directory is part of the container image; we should simply fail if a target mountpoint does not exist in the container image.
I'm told we have users that would like to create mountpoints on /. Can we do this? If we flip the immutable flag, create a directory for the mountpoint under /, and flip it back on, will that be enough or will there be unwanted side effects?
The immutable flag isn't used with composefs, there's currently no "hack" to mutate the rootfs at runtime (without making it globally writable). See also https://gitlab.com/fedora/bootc/tracker/-/issues/26 which is tracking some support for that. But I don't think we want that for disk image defaults, again I think it's basically that the directory should be owned in the container.
Do we want to support a scenario where a base image contains data destined for a custom mountpoint?
It has never been supported to "split" ostree/bootc content across multiple filesystems, I think if you try you'll get EXDEV
when ostree tries to hardlink. The only thing that will work is to have /var
or a subdirectory thereof, and especially doing all of /var
today is a bit tricky to handle unfortunately. I think what we could probably do is add bootc install to-filesystem --var=<mount spec>
or so.
Tools like dpkg/rpm/etc support splitting their content in this way, but they only have one copy of content. As ostree/bootc wants to support multiple versions (and are based around a shared/deduplicating backing store) it's a lot harder. https://github.com/containers/composefs/issues/125 touched on some of this, but basically not going to happen anytime soon.
So what we should do (to combine these two things) is:
So what we should do (to combine these two things) is:
- Require the mountpoint exist in the container
- Error if it's not empty
I think this simplifies the initial implementation enough to make it work quickly. It does sort of tie the base image to the configuration (mountpoint in base image + bib build config), but that's probably fine.
Does it make sense though to relax the first rule for some paths? IIUC, there's no harm in creating a /var/data
directory after running bootc install
and mounting a separate filesystem there without /var/data
existing in the base image, is there?
It'd be really, really, really, really nice if we can share code with Anaconda too.
I'm sorry if I'm failing to see something obvious but this keeps coming up and I'm still not clear what code we could share. Anaconda operates in a very different environment than osbuild. Also, osbuild stages are (usually) thin wrappers around system utilities. We're currently talking about adding 2-3 stages that essentially do:
mkdir <mountpoint> # or maybe we wont do this
sfdisk <device> <long sequence of partitioning commands>
and then write a line in fstab for mounting the filesystem to the mountpoint.
What is there to share? Are we talking about importing python modules shared with anaconda for shelling out to binaries in a consistent way?
The fstab stage a 50 line python script.
The disk partitioning is very different in the two cases. While the osbuild sfdisk stage is quite large, most of that is transforming the partition table description to an sfdisk script to run against the disk, because we need a precomputed description of the partition table before we start building.
It does sort of tie the base image to the configuration (mountpoint in base image + bib build config), but that's probably fine.
Yeah; this is one reason why I was arguing to support embedding partitioning information in the image itself (it's also what systemd-repart is aiming for, though we have use cases beyond what that tool does).
Does it make sense though to relax the first rule for some paths? IIUC, there's no harm in creating a /var/data directory and mounting a separate filesystem there without /var/data existing in the base image, is there?
Yep, subdirectories of /var
are totally fine. Though note that the default for .mount
units is to create the directory - so all that osbuild (or anaconda) need to do here is set up the desired filesystem. (Also of note actually there's also e.g. x-systemd.makefs
, so for these type of use cases, it can even suffice to just reserve the block device space at disk image generation time; IMO this can be even a best practice because doing things that way better supports a "factory reset" that blows away these external filesystems too)
It does sort of tie the base image to the configuration (mountpoint in base image + bib build config), but that's probably fine.
Yeah; this is one reason why I was arguing to support embedding partitioning information in the image itself (it's also what systemd-repart is aiming for, though we have use cases beyond what that tool does).
We should get back on this. The idea was good, but I think we kept getting lost in some details. Partitioning descriptions and configurations aren't simple (they can be, but people quickly want to do more when you give them a little), so I don't think we should start adding config keys to a file arbitrarily without thinking about what it might look like when it grows.
Does it make sense though to relax the first rule for some paths? IIUC, there's no harm in creating a /var/data directory and mounting a separate filesystem there without /var/data existing in the base image, is there?
Yep, subdirectories of
/var
are totally fine. Though note that the default for.mount
units is to create the directory - so all that osbuild (or anaconda) need to do here is set up the desired filesystem. (Also of note actually there's also e.g.x-systemd.makefs
, so for these type of use cases, it can even suffice to just reserve the block device space at disk image generation time; IMO this can be even a best practice because doing things that way better supports a "factory reset" that blows away these external filesystems too)
Are we thinking about standardising on something like this? This seems similar to the user creation issues, where we would love for useradd
(or some wrapper) to do all the bits that are required for the bootc world.
Are we considering doing something that:
.mount
unit for creating the mountpoint mounting the filesystem at boot, possibly using some predictable attribute (label??).
Opening this issue to track support for custom mountpoints.
@mvo5 described the issue with custom mountpoints in 06e1b2a67abea54425ea4d36cb64a5d2d988af1e.
Short version:
bootc
needs an empty root tree to install to when runningbootc install to-filesystem
. With our current pipelines, when we build an image, we format the disk with all the partitions, mount every mountpoint to its location under a root tree, and callbootc install
to put its files in the fully mounted root tree, which will be non-empty if it contains directories for custom mountpoints.I tested the idea in the commit message
and it works as expected, with some caveats:
/
(the deployed root, not the physical disk root), since afterbootc install
, it is marked immutable.Important note: Some of my tests were "simulated", meaning I scripted or manually intervened to do what osbuild would be doing without actually using a stage, but the behaviour should be the same.
For example, I'm considering the following scenario:
With our proposed solution, bootc will create a filesystem that contains
/opt/myapplication/log/build-time
, but on boot the path/opt/myapplication/log
will be shadowed by the new mountpoint. If there's no way to support this scenario, we should probably inspect the image (which we already do when preparing the manifest) and error out.Questions (cc @cgwalters):
/
. Can we do this? If we flip the immutable flag, create a directory for the mountpoint under/
, and flip it back on, will that be enough or will there be unwanted side effects?/data
, and the disk image is meant to have a separate partition for it. Is this even possible?