zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

Move udev initrd hooks into zfsutils to mount zdev-based pools at boot #39

Closed pdf closed 11 years ago

pdf commented 12 years ago

Ref zfsonlinux/zfs#811 for discussion

This patch attempts to solve the following problem of zdev-based pools not mounting on boot:

Setup a new pool using zdev.conf

# cat /etc/zfs/zdev.conf

# echo 'disk1 pci-0000:00:07.0-virtio-pci-virtio4' >> /etc/zfs/zdev.conf
# echo 'disk2 pci-0000:00:08.0-virtio-pci-virtio5' >> /etc/zfs/zdev.conf
# zpool create -f tank mirror /dev/disk/by-path/pci-0000:00:07.0-virtio-pci-virtio4 /dev/disk/by-path/pci-0000:00:08.0-virtio-pci-virtio5
# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        tank                                     ONLINE       0     0     0
          mirror-0                               ONLINE       0     0     0
            pci-0000:00:07.0-virtio-pci-virtio4  ONLINE       0     0     0
            pci-0000:00:08.0-virtio-pci-virtio5  ONLINE       0     0     0

errors: No known data errors
# zpool export tank
# rm /etc/zfs/zpool.cache
rm: cannot remove `/etc/zfs/zpool.cache': No such file or directory
# zpool import -d /dev/disk/zpool tank
# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0

errors: No known data errors
# reboot

Post reboot

# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0

errors: No known data errors
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank    93K   952M    30K  /tank
# mount |grep zfs
# zfs mount -a
# mount |grep zfs
tank on /tank type zfs (rw,xattr)
#

So, this happens because the pool is not available when mountall is doing it's work, since udev has not populated the zdev device nodes. The only caveat to the proposed approach is that the initrd needs to be regenerated when zdev.conf is updated, but since the alternative is no support at all, that seems fair.

dajhorn commented 12 years ago

I will need a confirmation of the bug on real hardware before considering this kind of change. KVM and Xen are known to be problematic for ZFS. Some configurations, for example, do things like issuing plug events on INT13 drives after POST.

Your problem might resolve if you upgrade the virtualization environment. VMware, VBox, and libvirt users have all had similar problems that were fixed by VM upgrades or configuration tweaks.

dajhorn commented 12 years ago

A clarification for the lurkers: "zdev" used here means the /etc/zfs/zdev.conf file, not the /dev/zfs node or a zvol.

pdf commented 12 years ago

I started looking at this because I'm trying to use zdev.conf on real hardware. The udev stuff in mountall seems pretty racey, I couldn't seem to find an easy win there. Are you actually expecting udev to have configured the zdev nodes by the time you try to read the filesystems? Without the helpers in initrd that seems optimistic. Maybe it's doable in mountall by activating some timers, but how do you know what you're waiting for? You'd have to incorporate interrogation of the zpool and it's cache to work out what devices you're waiting on from udev, and eventually give up if you determine the pool can't be imported for whatever reason. Things start to get much more complicated. I think this is moderately low impact - we're still waiting until mountall for the actual work to be done, but we set up the prerequisite backing device links in initrd so they're there when needed at mountall.

Are you saying that people actually have zdev.conf working reliably without this, just using mountall? I can't see how that can work.

The KVM scenario above was just for testing, so I don't have to bounce boxes that are in use (and take considerably longer to boot).

dajhorn commented 12 years ago

I started looking at this because I'm trying to use zdev.conf on real hardware

Describe your equipment and post a dmesg.

Are you actually expecting udev to have configured the zdev nodes by the time you try to read the filesystems?

No, but you could have a partial solution in your mountall patch. Moving parse_zfs_list() is incorrect because it breaks fstab integration and error handling, but calling udev_catchup() before could improve behavior because it duplicates some important udev behavior. (Checking into this is now on my todo list.)

Are you saying that people actually have zdev.conf working reliably without this, just using mountall?

Everybody that matters, adheres to best practices, or follows my advice is using /dev/disk/by-id. Doing anything else on Debian or Ubuntu is currently unsupported.

Changing the initrd for a basic ZoL installation requires caution. I will try it, do some testing, and think about whether I want to support it.

pdf commented 12 years ago

Describe your equipment and post a dmesg.

As far as the equipment, zdev.conf doesn't work on any equipment, right? I have a few systems here, I can describe them that's going to be useful, but they all work with by-path or by-id.

No, but you could have a partial solution in your mountall patch. Moving parse_zfs_list() is incorrect because it breaks fstab integration and error handling, but calling udev_catchup() before could improve behavior because it duplicates some important udev behavior. (Checking into this is now on my todo list.)

Yeah, skipping mount_policy was not so smart, I was concerned about affecting behaviour for standard filesystems, though I suppose the worst you're likely to do is introduce a delay. Still, I was getting inconsistent results even after doing udev_catchup, parse_zfs_list, parse_mountinfo. I'll have another look though.

Everybody that matters, adheres to best practices, or follows my advice is using /dev/disk/by-id. Doing anything else on Debian or Ubuntu is currently unsupported.

Documentation on the ZoL site describes using zdev.conf as best practice and omits any caveats, so that should be changed if it's not the case.

Changing the initrd for a basic ZoL installation requires caution. I will try it, do some testing, and think about whether I want to support it.

Thanks for your time.

pdf commented 12 years ago

I looked at mountall with a bit more effort btw - it gets pretty messy pretty quickly, and I'm even more convinced now that initrd is the right place to setup zdev.conf. Do you have any new thoughts on this?

dajhorn commented 12 years ago

I intend to try your proposal in the Quantal beta builds. If it works, then it will be a candidate for backporting to the LTS builds.

Looking into this problem has me thinking about this thread:

https://groups.google.com/a/zfsonlinux.org/d/topic/zfs-devel/JwxlxSI1eDg/discussion

The comments about libblkid are pertinent to the basic /dev/* vice /dev/disk/*/* problem. Working in the Ubuntu init stack makes improving the ZFS import seem like a good idea.

dajhorn commented 12 years ago

@pdf: The first commit in this branch is now in the daily Quantal builds.

I amended the commit by backing out the rename operations and naming the new hook "zdev" instead. Moving a file between packages requires some "Replaces: + Provides:" logic that makes the control file permanently messy. Rather, the branch as given would break during upgrades because it didn't have an updated control.

I will add the second optimizing commit before Ubuntu 12.10 is released; I need to update my test harness for it.

pdf commented 12 years ago

Sounds fair, I'll free up some time to do some testing based on this in the next few days.

pdf commented 12 years ago

Okay, non-root volumes are working fine, will do some testing with root on ZFS shortly, though I'm not using that on any of my boxes. Can I ask if you're considering including this on Precise once it's vetted?

dajhorn commented 12 years ago

Can I ask if you're considering including this on Precise once it's vetted?

Yes.

pdf commented 12 years ago

Great. Unfortunately it doesn't look like the grub PPA supports Quantal yet? So, I can't test the package changes with root on ZFS until that's up, or I build the packages myself.

pdf commented 12 years ago

Got it tested using the Precise build of grub2 (saved me building a Quantal box just to compile that), all looks good to me - ZFS on root still works, and so do zdev-based pools.