Open mcattamoredhat opened 10 months ago
cc: @say-paul @runcom @7flying
@mcattamoredhat what upgrade was done on the new deployment(ostree: 0)?
seems the custom lvm : /foo
and /foo/bar
failed to mount.
Jan 09 07:54:40 localhost systemd[1]: Reached target Preparation for Local File Systems.
Jan 09 07:54:40 localhost systemd[1]: foo.mount: Failed to check directory /foo: No such file or directory
Jan 09 07:54:40 localhost systemd[1]: Mounting /foo...
Jan 09 07:54:40 localhost systemd[1]: var.mount: Directory /var to mount over is not empty, mounting anyway.
Jan 09 07:54:40 localhost systemd[1]: Mounting /var...
Jan 09 07:54:40 localhost mount[734]: mount: /foo: mount point does not exist.
Jan 09 07:54:40 localhost systemd[1]: Starting Rule-based Manager for Device Events and Files...
Jan 09 07:54:40 localhost systemd[1]: foo.mount: Mount process exited, code=exited, status=32/n/a
Jan 09 07:54:40 localhost systemd[1]: foo.mount: Failed with result 'exit-code'.
Jan 09 07:54:40 localhost systemd[1]: Failed to mount /foo.
Jan 09 07:54:40 localhost systemd[1]: Dependency failed for /foo/bar.
Jan 09 07:54:40 localhost systemd[1]: Dependency failed for Local File Systems.
Jan 09 07:54:40 localhost systemd[1]: Dependency failed for Mark the need to relabel after reboot.
Jan 09 07:54:40 localhost systemd[1]: selinux-autorelabel-mark.service: Job selinux-autorelabel-mark.service/start failed with result 'dependency'.
Jan 09 07:54:40 localhost systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
Jan 09 07:54:40 localhost systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
Jan 09 07:54:40 localhost systemd[1]: foo-bar.mount: Job foo-bar.mount/start failed with result 'dependency'.
The workaround mentioned in 337 works , Adding the following customization to upgrade blueprint.
[[customizations.files]]
path = "/etc/systemd/system/remount-lvm.service"
data = "[Unit]\nDescription=remount lvm\nDefaultDependencies=no\n[Service]\nType=oneshot\nRemainAfterExit=yes\nExecStartPre=chattr -i /\nExecStart=mkdir -p /foo/bar\nExecStopPost=chattr +i /\n[Install]\nWantedBy=remote-fs.target\n"
[customizations.services]
enabled = ["remount-lvm.service"]
It needs to be embedded inside osbuild-composer to ensure lvs are mounted correctly
The workaround mentioned in 337 works , Adding the following customization to upgrade blueprint.
[[customizations.files]] path = "/etc/systemd/system/remount-lvm.service" data = "[Unit]\nDescription=remount lvm\nDefaultDependencies=no\n[Service]\nType=oneshot\nRemainAfterExit=yes\nExecStartPre=chattr -i /\nExecStart=mkdir -p /foo/bar\nExecStopPost=chattr +i /\n[Install]\nWantedBy=remote-fs.target\n" [customizations.services] enabled = ["remount-lvm.service"]
It needs to be embedded inside osbuild to ensure lvs are mounted correctly
I wonder if we should add it transparently ourselves when filesystem customizations are needed so that the user doesn't need to remember that the service is needed.
I bet what's going on here is the osbuild pipeline is only making these directories on top of the deployed disk image (i.e. equivalent of anaconda %post
- and actually because we're not using https://github.com/ostreedev/ostree/pull/3094 I bet we're losing the immutable bit on the deployment root /
which would have otherwise stopped this incorrect behavior.
The osbuild pipelines need to change to create this directory as part of the ostree commit instead.
okay, We have enabled filesystem customization for deployments(raw,image, iso) only. Image builder actually consumes the base-commit(which does not have any data about lvm ) to build the deployments with the lvm data. So, if obuild: edge-commit and edge-container works the way @cgwalters suggested. Then it would be a matter of enabling fs.customization for commits also. cc @achilleas-k The caveat to that is: it will not provide integrity of having the same base image for various applications requiring different lvms. Also I suspect it will bring complexity in-terms , deployment and upgrade. @nullr0ute @runcom @7flying
okay, We have enabled filesystem customization for deployments(raw.image, iso) only. Image builder actually consumes the base-commit(which does not have any data about lvm ) to build the deployments with the lvm data.
If I understand this problem correctly this is about creating the mount points in the filesystem for custom mounts (eg /data)?
@nullr0ute it fails on upgrade, as dirs gets deleted, needs to be created explicitly to mount the volume, else system does not boot and plunges into emergency shell.
So based on the :+1: whether it's a LVM partition (or NFS or some other storage) is irrelevant here. The user has to know where they want to mount things and the mount point should be created as part of the ostree stage as not as part of raw/iso etc. otherwise as @cgwalters says it's going to be a problem.
How we fix that I don't know, maybe we need to interpret the blueprint for both stages, maybe we need a "custom mount points" blueprint to ensure the directories are created (including the correct permissions).
Any other solution, such as custom systemd services are just working around the problem and will no doubt cause other issues.
So there are two things that's biting me:
I guess it can be tested out but if @achilleas-k or @cgwalters you are already aware of the process, please share your thoughts.
So there are two things that's biting me:
- Will the deployment(raw,iso) blueprint need the file-system customization again if that's already applied in the base-commit itself
I wouldn't think so in this case.
- Does the file-system customization in upgrade commit needs to match that of base commit.
That is a good question
I guess it can be tested out but if @achilleas-k or @cgwalters you are already aware of the process, please share your thoughts.
Either way, we need to test this to decide how we are going to tackle the rest.
So I tested it by bypassing the fs.customization not allowed
part for commit and container. but didn't workout with initial deployment nor with upgrade commit.
There's few more things that I need to test by tweaking the filesystem creation to see if I can make it work.
I bet what's going on here is the osbuild pipeline is only making these directories on top of the deployed disk image (i.e. equivalent of anaconda %post - and actually because we're not using https://github.com/ostreedev/ostree/pull/3094 I bet we're losing the immutable bit on the deployment root / which would have otherwise stopped this incorrect behavior.
Even we apply them to a commit pipelines, I am bit unsure if it would be of much use and user may want different mount-points from same base commit.
How we fix that I don't know, maybe we need to interpret the blueprint for both stages, maybe we need a "custom mount points" blueprint to ensure the directories are created (including the correct permissions).
Its wont be possible to sync both the stages which will render system unbootable,
Possible Solution :thinking: Modifying into osbuild deployment pipeline: layering fs.customization as new commit then building raw/iso.
So I tested it by bypassing the
fs.customization not allowed
part for commit and container. but didn't workout with initial deployment nor with upgrade commit.
Overriding the customization checker to pass these through doesn't help because the customizations have no effect on ostree commits. IB doesn't do anything with that customization when it's building a commit.
Which brings me to
Then it would be a matter of enabling fs.customization for commits also. cc @achilleas-k
What would that do? Is it just a matter of creating the directories? The user would have to know that they will have to add the partitions/mountpoints to the image blueprint as well, so we'd have to document this at least. Also the same fs customizations would have to be added to any upgrade blueprints for the same reason. Not an issue, just making sure I understand all the implications.
Does running post-copy
on the initial deployment solve the issue?
I haven't dug into this deeply but AIUI (from previous conversations) osbuild has an architecture where it wants to create a full final filesystem tree, then copy that to disk, and then it's at that point we'd run post-copy.
This means that we'd still have the problem that we allow uncontrolled mutation of the toplevel filesystem root for disk images.
The most robust way to fix that would be to try to better honor ostree's rules in pipeline builds. Specifically when generating disk images, in the default configuration we want to ensure that only /etc
and /var
are writable. Source of truth for things should be the ostree commit (future: container image).
This means that we'd still have the problem that we allow uncontrolled mutation of the toplevel filesystem root for disk images.
Do we want to disallow this? Adding partitions and mountpoints to the root of ostree-based disk images was enabled somewhat recently as a feature requirement for edge. If this is in conflict with ostree's rules, or if it's "more correct" to move that configuration to the commit/container and block filesystem customizations on disk builds, then we should do that.
My main question is: What does a filesystem customization look like when applied to a commit? Is it just about creating directories? We've talked about some form of metadata describing a partition table for containers, but afaik there's nothing like that for the base ostree case that we're using now for R4E and Fedora IoT.
Do we want to disallow this? Adding partitions and mountpoints to the root of ostree-based disk images was enabled somewhat recently as a feature requirement for edge. If this is in conflict with ostree's rules
The rule basically is "the commit should be source of truth", with local state in /etc
and /var
.
Now except, we just added a giant ability to relax this in https://github.com/ostreedev/ostree/pull/3114 ...but that still has the semantic that content placed there is dropped on upgrades.
or if it's "more correct" to move that configuration to the commit/container and block filesystem customizations on disk builds, then we should do that.
I wouldn't say block all filesystem customizations on disk builds. Today for example, we (should) support configuring e.g. subdirectories of /var
like /var/home
just in a disk image, without requiring changes to the commit/image.
A simple way to look at this is for cases like Fedora CoreOS where the commit/image is shipped from upstream, we still allow choose their backing filesystem type and create sub-mounts of /var
.
The problem case comes more about toplevel mounts.
I should also clarify that we enabled root.transient
in current centos-bootc, which again basically avoids this issue as it allows uncontrolled mutation of /
by default, so custom systemd units which mount other devices in a toplevel mount will generally Just Work.
Since this is about toplevel mounts, I'd today disallow them in disk image builds, unless root.transient
is enabled. The subtlety in all this really wants us to use the same technology at build time that we do at upgrade time. (And aligning the container/runtime state was the rationale behind root.transient
).
Does running post-copy on the initial deployment solve the issue?
No, I talked to alexlarsson regarding this , but it does not help our cause.
So even we add a directory in commit it will be lost in the upgrade if the upgrade blueprint doesn't have the same fs/directory customization.
So unless we have a fix for 337 I think we can:
1 . add the above service auto-add in blueprint directly when fs-customization is added. User can be see this using composer-cli blueprints show blueprint.toml
or
greenboot
to create the directories So even we add a directory in commit it will be lost in the upgrade if the upgrade blueprint doesn't have the same fs/directory customization.
So unless we have a fix for 337 I think we can: 1 . add the above service auto-add in blueprint directly when fs-customization is added. User can be see this using
composer-cli blueprints show blueprint.toml
or 2. use something likegreenboot
to create the directories
I think that we have already internally discussed with the team that greenboot has a defined use case and that it mustn't deviate from that role.
I think that we have already internally discussed with the team that greenboot has a defined use case and that it mustn't deviate from that role.
agreed, greenboot's scope doesn't include things like this and shouldn't be leveraged
Thoughts regarding option 1?
File and Service customizations are not allowed in deployment stages, so the effective solution will be to embed a generic service(template unit file to create directories of the filesystem mountpoints) in commit.
Thoughts regarding option 1?
File and Service customizations are not allowed in deployment stages, so the effective solution will be to embed a generic service(template unit file to create directories of the filesystem mountpoints) in commit.
I don't consider adding a service file fragile at all, so I don't mind taking that path, but I would add it in the the deployment. Also, as I said above, we wouldn't be adding it with the blueprint (the user shouldn't be required to know that the service file is needed), but rather detect that the user is requesting filesystem customizations in the deployment blueprint and add the required service file for those to work internally in a transparent way.
My main question is: What does a filesystem customization look like when applied to a commit? Is it just about creating directories?
Yes. Except...there's also the approach of moving the partition creation to firstboot, and injecting logic to do that into the commit. This is the approach taken by Ignition and systemd-repart. In general I prefer that approach because it keeps the disk image simpler - or stated another way, it more strongly enforces the decoupling of the commit (container image) and disk image.
If there's a way to enforce that the corresponding toplevel directories are exist in the commit when specifying disk customizations, that seems like a decent fix. In a container flow that's basically just podman run <input> test -d /mntpoint
, with ostree commits as input there's ostree ls
etc that work on a fetched commit.
but rather detect that the user is requesting filesystem customizations in the deployment blueprint and add the required service file for those to work internally in a transparent way.
The transparent way - I imagine is to magically make it appear in the blueprint show
when user adds the fs customization.
OR can be added to logs when creating the image- somewhat translucent.
.there's also the approach of moving the partition creation to firstboot,
That's the idea, to be implement by a unit file that creates the mountpoint.
but rather detect that the user is requesting filesystem customizations in the deployment blueprint and add the required service file for those to work internally in a transparent way.
The transparent way - I imagine is to magically make it appear in the blueprint
show
when user adds the fs customization. OR can be added to logs when creating the image- somewhat translucent.
No, just adding it internally programatically, if we show a blueprint that is different from what the user has requested that is going to cause problems too. When we see in the blueprint that file customizations are added, we also include the service file internally.
@achilleas-k @ondrejbudai just FYI if you have any inputs/recommendations given the way we build commit/disk in osbuild and given the above suggestions?
So the approach we decided is,
When user adds fs-customization in deployment blueprint, osbuild will add and enable unit file , something like this in background
The workaround mentioned in 337 works , Adding the following customization to upgrade blueprint.
[[customizations.files]] path = "/etc/systemd/system/remount-lvm.service" data = "[Unit]\nDescription=remount lvm\nDefaultDependencies=no\n[Service]\nType=oneshot\nRemainAfterExit=yes\nExecStartPre=chattr -i /\nExecStart=mkdir -p /foo/bar\nExecStopPost=chattr +i /\n[Install]\nWantedBy=remote-fs.target\n" [customizations.services] enabled = ["remount-lvm.service"]
It needs to be embedded inside osbuild-composer to ensure lvs are mounted correctly
Question is how we enable it,
I would say doing it "properly" (with an install section) is preferable. I agree, it means it can be disabled and break the system, but it also means it can be disabled the right way if necessary (like if the user decides they don't want/need the partition anymore). We would have to document this of course but I'd rather it behaved like any other unit.
On the topic of creating the unit itself: I would be in favour of making an osbuild stage for this. A very specific stage that creates this service file for a set of mountpoints. But we can do it with file customizations in this repository first, for development, testing, and to get it out quickly, and move it down to osbuild later.
there's also the approach of moving the partition creation to firstboot, and injecting logic to do that into the commit. This is the approach taken by Ignition and systemd-repart. In general I prefer that approach because it keeps the disk image simpler - or stated another way, it more strongly enforces the decoupling of the commit (container image) and disk image.
I think we should do this. On the composer level it would mean enabling filesystem customizations for ostree commit types. But we would have to consider what to do with the same customizations for deployment types (disks). Do we keep them? Do we handle conflicts? Or should we deprecate them and inform users that they should be putting their partitioning info in the commit?
Or should we deprecate them and inform users that they should be putting their partitioning info in the commit?
this, to me, is really counter intuitive :/ I understand it could be the way to go, but I kind of expect partitioning decisions to be made in deployment right? anything, I don't dislike it completely, if this is something the osbuild team is in favor of doing we can adapt I think cc @mrguitar
I understand it could be the way to go, but I kind of expect partitioning decisions to be made in deployment right?
Don't get me wrong, I'm more on this side too. It makes more sense to me as well that given a base ostree commit or container, you can deploy it in any number of scenarios with different partitioning layouts.
I'm trying to reconcile the user experience with the technology decisions we're making here (so I too would love to hear what Ben has to say). I admit I'm not completely aware of what expectations we've created for users and how we can change that (or even if we should). But I do think we should be thinking ahead about what it means when we make these changes.
So, more concretely: If the decision here is "partitioning decisions should be made at ostree base image creation time", what does that mean for the existing image building process of:
Questions:
I could probably come up with more qs but that's a good starter for now I think :laughing:
The thing is: We can't change user partitions across upgrades by default no matter what. So remember, even if partitioning is specified in a disk image today, in-place upgrades won't get any changes made there.
This architecture is pretty clear with e.g. Anaconda and kickstart - Anaconda is only used once, and not thereafter. Ignition is also designed to run just once partly for this reason, but Ignition also does support "initialize idempotently" - i.e. if the partition already exists in the expected format, it is reused (and hence data is preserved).
The x-systemd.makefs
option has existed for a really long time in systemd .mount
units that has a similar semantic. What only exists much more recently is systemd-repart which pairs with that to handle partition creation dynamically. Now, systemd-repart is its own universe. In theory today, both could be used today without osbuild doing anything by just having the user inject the configuration to do so into the filesystem tree.
I don't have a strong opinion about this; if we were to take a dependency on repart it'd need some analysis. It could also be done by injecting custom systemd units to create the partitions, and mount units using x-systemd.makefs
. I don't have a really strong opinion here myself.
Is partitioning at step 2 still supported?
It will probably have to continue to be if it was before, right?
Will it be something like "Choose mountpoints in step 1, but create partition sizes and filesystems in step2"?
No, all that state would best be computed on firstboot (as is done with ignition and repart).
How do we reconcile those two steps to make sure builds don't fail when the two configurations are incompatible?
In the end, moving the logic to firstboot necessarily implies some difficulty in trying to make that a "build" time check. Honestly I would focus more on making it convenient for people to have an edit-compile-debug cycle (where "debug" means booting for real).
Should the mountpoints be preserved in the step 3 build configuration? Do we need to figure out a mechanism that preserves mountpoints in step 3 given the original commit from step 1 as a "parent" (again, like we do with uids and gids)?
You're asking what happens if a user specifies a mountpoint a blueprint for creating a commit at one point, and then removes it later? I'd expect the mountpoint to drop out, yes.
Having this the filesystem customization in commit comes with difficulties:
you can deploy it in any number of scenarios with different partitioning layouts.
wont be poosible
I don't have a strong opinion about this; if we were to take a dependency on repart it'd need some analysis. It could also be done by injecting custom systemd units to create the partitions, and mount units using x-systemd.makefs. I don't have a really strong opinion here myself.
I would prefer repart or x-systemd.makefs over custom systemd unit, probably they are more tried and tested , repart's implementation seems to be straightforward but analysis will be to see if the other mountpoints remains intact, modifying /etc/fstab
can be done too but I am not sure about all the params - need some more learning for me.
We have been testing locally RHEL 9: Filesystem customizations for edge-raw-image #255 RHEL-9.4
osbuild-104-1.20240108gitc62e555.el9.noarch osbuild-composer-98-1.20240108git0169b7b.el9.x86_64 weldr-client-35.9-1.el9.x86_64
With blueprint:
After booting the image both in BIOS and UEFI mode, we see the corresponding logical volumes created.
Nevertheless, the system is not able to boot the deployment ostree:0 after upgrade.
On the other hand, system deployment ostree:1 is able to boot with no problem, as a matter of fact is possible to see the custom lvs.
Due to virsh console not showing any useful output, it has been difficult to track the root cause of the issue. May you please help to check this issue when you have time?