zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

zfs-initramfs: Support GRUB's native root=ZFS= #140

Closed rlaager closed 8 years ago

rlaager commented 9 years ago

Upstream GRUB sets root=ZFS=rpool/path/to/rootfs by default. By supporting that syntax (while retaining the existing support for backwards compatibility), we can work out-of-the-box.

rlaager commented 9 years ago

One thing to keep in mind here is that this change, as written, results in the bootfs property on the pool being completely ignored. I personally think that's appropriate. I do not in any way want to choose an option from GRUB and have something else boot because that's what my bootfs property was set to. However, this does mean that any eventual beadm-type support has to integrate with GRUB. We can't just twiddle the bootfs property.

If you're against this approach, it's trivial to change this to keep honoring the bootfs property, and only use bootfs from the root=ZFS= option if bootfs is unset.

Now that someone has ported beadm to Linux, that's actually the next thing I'm hoping to work on. I played around with it earlier today, and at first glance, it seems like the best answer is for zfs-linux to ship a new /etc/default/grub.d/15_linux_zfs or similar. This would loop over all the datasets under $rpool/ROOT and should probably behave in one of these ways:

A) Create a single GRUB entry per dataset. This entry would boot using the /vmlinux and /initrd.img symlinks. That is, we'd support booting the latest kernel from each alternate boot environment. This keeps the list from getting too large (N datasets * M kernels). It's also something that can be implemented without mounting the alternate boot environments.

B) Just like A, but actually mount them (read-only, to a temporary location) if they're not already mounted. This would allow us to show the actual kernel version (and the Ubuntu version, for that matter). We could also avoid using the symlinks.

Both of these approaches basically enforce that the "active" BE is defined as the one from which update-grub is run. This means we'd be strongly encouraging the strategy of updating the current system in place and only offering rollback options. In contrast, Solaris supports the idea of cloning the current system, updating the clone, and then rebooting into it. That's a really cool concept, and here's a possible approach to support that:

C) Also ship a 05_linux_zfs, which is just like 15_linux_zfs, but outputs something if and only if the active-on-boot BE is not the same as the currently booted BE. It would output the configuration for the active-on-boot BE so it's higher in the list than the currently booted BE. Then, 10_linux outputs for the currently booted BE, and 15_linux_zfs would output for everything else (skipping the active-on-boot one in this case).

The biggest problem I see is that without support from various daemon packages, it's going to be limited in usefulness in practice.

dajhorn commented 9 years ago

@rlaager, thanks for doing this.

Things like legacy bootfs handling can be dropped because congruence with Solaris is no longer a design goal.

I'm not currently resourced to test or support things like the beadm port, so if you want to carry GRUB and related components, then perhaps we can get these things back into the next release series.

FransUrbo commented 9 years ago

Also have a look at https://github.com/zfsonlinux/pkg-zfs/tree/snapshot/debian/wheezy/0.6.3-35-4c7b7e-wheezy/scripts/zfs-initramfs for a lot more advanced (almost complete rewrite) of the initrd scripts.

rlaager commented 9 years ago

I'm not sure that we need a single script that supports every scenario on every distro. Here's my stab at a rewrite for Ubuntu (ignore the spurious debian/zfsutils.zfs.default.orig), which applies on top of the change from this pull request: https://github.com/zfsonlinux/pkg-zfs/commit/50f30514c35d2ba2d915e0b327932b260eecce09

ryao commented 8 years ago

If I recall correctly, the root=ZFS=$DATASET syntax is myself and either @dajhorn or @rlaager agreed to use to adopt as something of a standard back in 2012. I would like to revisit it because the root=ZFS=$DATASET syntax does not properly handle pool namespace collisions. Ideally, I would like the boot loader to pass the information about the pool by passing the pool's GUID and vdev information (using by-path syntax) to the initramfs archive through the kernel commandline so that we can handle spa namespace collisions completely unambiguously. I have yet to make time to work out a syntax that makes sense and implement the hooks required for it. When this is done, I think the old syntax should remain around for backward compatibility.

Also, I believe that it is possible to put the required logic into the kernel by allowing us to detect the rootfs mount via current->fs == NULL, but I still need to verify this is NULL at the rootfs mount with kgdb. This would hypothetically allow us to do pool import inside the kernel using this information without an initramfs when copy-builtin is used to build a kernel with ZFS support. It would also simplify initramfs design in the normal case where one is loading modules by allowing us to omit ZFS' userland binaries from the initramfs. All we would need to do is use the standard mount command.

This is something of a tangent, but it is the direction that my thinking on the boot process has taken as it has evolved.

ryao commented 8 years ago

Just to make it clear, the 14 lines from merging this will not make the transition to a solution that handles namespace collisions any harder. It would be best to merge it for the out-of-the-box support benefit now and implement support a solution to the namespace collision problem later after someone has been fleshed it out.

rlaager commented 8 years ago

Ubuntu is handling packaging themselves.