openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

Requesting guide for (semi-properly) manually administering multiple boot environments #5809

Closed evan-king closed 7 years ago

evan-king commented 7 years ago

I've followed this repo's guide for installing Ubuntu 16.04 on root zfs, and also successfully cloned the root fs and booted into it. However the details I came up with on my own for switching boot environments were not exactly coherent, reliable, or ultimately correct, though they sort of got the job done.

For example, /boot/grub/grub.conf is part of the root image, so there isn't one boot menu that can be maintained for all boot environments, and splitting it into a separate filesystem rendered the system unbootable. I do not adequately understand the stages of booting with regard to filesystem setup. There appears to be a point at which the filesystem in play is the first one (in alphabetical order) designated for / that doesn't have mounting disabled, and a later step which relies on the details of the grub.conf in whichever filesystem was used for the first pass. Further, I did not figure out how to mount and modify a root partition without being booted into it (meaning part of setting up a new BE is using the grub console to correct the fs reference on first and failing boot).

Even if I didn't worry about maintaining a single grub menu for all BEs, just having any one menu aware of all BEs would require being able to mount other roots and add an update-grub script similar to 10-linux with considerably greater complexity - and that one is already quite daunting. Several unknowns present, any one of which is troubling for my level of expertise:

The long term solution is obviously getting our own beadm, but in the mean time would it be feasible to provide a manageable set of appropriate steps to manually administer multiple boot environments?

Minimally, we need:

And ideally, we'd also like:

Plus maybe some bells and whistles:

I could probably provide some support for this request with documentation patches, but lack sufficient understanding of even the high-level picture let alone what would constitute following best practices and forward compatibility with a beadm port. Some links to resources that would productively allow me to self-educate on these specific issues/what exactly beadm does would also be welcome. It's unlikely I'd have sufficient resources to come full circle on this if I have to spend a lot of up front time on research, but it would still be helpful to me at least (and perhaps to others who might be inspired to contribute to either short or long-term solutions).

Cheers.

madwizard commented 7 years ago

I'd be interested in helping out with this. I'm using ZFS on / for some time now. Although I use separate /boot partition to keep it simplier.

behlendorf commented 7 years ago

@rlaager may also be interested in this.

rlaager commented 7 years ago

This is definitely something I want, assuming we are talking about multiple clones of the installed image (not, for example, booting different distros). I think someone did a port of beadm to Linux, but I'm not sure the status or how that functions.

Right now (ZFS or not), the GRUB menu has entries for multiple kernels, but one root filesystem. The goal is to support multiple root filesystems, right? And we presumably need to keep the support multiple kernels. Are the kernels independent of the root filesystems, or are only some combinations acceptable? I think it's the latter, as the root filesystem contains kernel modules, for example.

I wonder if it would be reasonable to separate /boot/grub into its own dataset. Assuming that works, then /boot still lives inside the root filesystem, and thus we keep the association between kernels and root filesystems, without having to do complicated tracking. Then you're basically just building entries that look like: linux /ROOT/ubuntu-2@/boot/vmlinuz-4.4.0-63-generic root=ZFS=rpool/ROOT/ubuntu ro linux /ROOT/ubuntu-2@/boot/vmlinuz-4.4.0-62-generic root=ZFS=rpool/ROOT/ubuntu ro linux /ROOT/ubuntu-1@/boot/vmlinuz-4.4.0-62-generic root=ZFS=rpool/ROOT/ubuntu ro linux /ROOT/ubuntu-1@/boot/vmlinuz-4.4.0-59-generic root=ZFS=rpool/ROOT/ubuntu ro

How you build those is another question. The current situation is that 10_linux builds entries for the currently mounted root filesystem. If we left that alone, at least for the moment, a possible option would be to add 10_linux_zfs that gets a list of BEs, mounts them to temporary locations, sees what kernels they have, and builds them. I think we'd want to honor /etc/default/grub from inside the BE as well.

Seeing how complicated that code is would probably shape whether it should be merged into 10_linux.

evan-king commented 7 years ago

What @rlaager asks and suggests sounds exactly right.*

Splitting /boot/grub into a separate dataset rather than all of /boot makes a lot of sense to me, though I'm not sure what steps are needed to enable either that or splitting /boot entirely. I've now tried both unsuccessfully, though @madwizard's comment indicates the latter is possible at least.

In the former case, I was left with a grub recovery console (as opposed to initramfs in other cases, where I was able to figure out what to do). The error message indicated failure to find contents that should have been in /boot/grub, and I don't know where to go from there.

It should also be possible to hose the current BE, including /boot, and have next boot (possibly cycling through a failed boot) end up on some other working environment. Perhaps that will require an additional chainloader? This degree of failure recovery at least is of secondary concern.


* For it to be useful we also need special handling of default/current/next BE. The way kernel choices/fallback are handled likely cover much of it, but when cloning roots there isn't the same simple relationship where first working option = best choice. Instead there's a need to be able to say "reboot into [named] environment (from now on)."

evan-king commented 7 years ago

I've started some independent work to replace/augment the grub-mkconfig script with a new version. It is heavily refactored - actually built from scratch because I found the original script quite objectionable in structure and utterly daunting to modify at my skill level.

As such it's unlikely that my work will be of great interest to members of the zfsonlinux project or downstream packages. However if anyone is interested they're welcome to review or follow my progress at https://github.com/evan-king/grub2-zfs-be.

Licensing is currently unspecified but will be updated shortly. I'll happily set the licensing however is needed or would be most appropriate (current intent is to just slap on GPLv3 to match the script it's replacing).

madwizard commented 7 years ago

@evan-king I mostly followed zfs on root howo and added encryptfs underneath. http://completelyfake.eu/2017/zfsonroot.html This is my howto. /boot is on ext4. I think steps to have separate /boot are not distribution specific.

evan-king commented 7 years ago

Thansk for the extra info.

My work actually depends on leaving /boot in the rootfs and only splitting out/sharing /boot/grub, as described by @rlaager (and which I later succeeded at setting up by repeating some steps from the guide). As far as I can tell, this approach is incompatible with currently achievable mechanisms of encrypting the rootfs, due to the loss of independent per-root maintenance of kernel booting support.

Until the need for an ext4 boot partition is eliminated, I'd rather not attempt including support for encrypted rootfs. And to be honest, I see little value in encrypting the rootfs when it needn't ever contain sensitive data. If my grub-mkconfig script rewrite generates more interest/participation than I anticipate, I'll certainly reconsider.

But as alluded above, it's a considerable deviation both from the original script and what seems to be the status-quo approach to system scripting, with heavy use of small pseudo-pure functions, environment kept at arm's length or encapsulated in non-pure functions, avoidance of shared state, and almost hostility toward optimization or inlining logic. I'd fully understand if others found my approach too opinionated or drastic a change to collaborate productively on it - it's just what I need to be able to tackle the problem myself.

ghost commented 6 years ago

This is definitely something I want, assuming we are talking about multiple clones of the installed image (not, for example, booting different distros)

Why not boot different distros?