mountall hangs if datasets have duplicate mountpoint properties

nedbass commented 11 years ago

I accidentally got my workstation into a state where the boot would hang after the "Running init-bottom...done". The cause turned out to be two datasets that had identical mountpoint properties. This seemed to prevent mountall from mounting either dataset and it would never emit the virtual-filesystems event, blocking udev and other services. If I start a shell on tty2 and mount one of the filesystems manually, mountall completes and the boot continues normally.

I need to verify this, but I think the first dataset with the duplicate mountpoint to show up in zfs list must be non-empty to reproduce this issue. However, as another test, I created a dataset with a non-empty underlying mountpoint, but this did not cause a boot problem. So the duplicate mountpoint properties seem to cause an issue beyond just the mountpoint being non-empty.

For example, this reproduces the issue:

zfs create tank/fish
touch /tank/fish/foo
zfs create -o mountpoint=/tank/fish frog

This does not reproduce it:

zfs create tank/fish
umount /tank/fish
touch /tank/fish/foo

This reproducer is admittedly a configuration error (I introduced it by receiving a replication stream into a test pool) but I thought we should evaluate the impact and how it could be handled better. I could collect more debug data, but I wanted to start by just reporting what happened since @dajhorn may have some insight about it.

$ apt-cache policy mountall ubuntu-zfs
mountall:
  Installed: 2.36.4-zfs1
  Candidate: 2.36.4-zfs1
  Version table:
 *** 2.36.4-zfs1 0
       1001 http://ppa.launchpad.net/zfs-native/daily/ubuntu/ precise/main amd64 Packages
        100 /var/lib/dpkg/status
     2.36 0
        500 http://archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
ubuntu-zfs:
  Installed: 7~precise
  Candidate: 7~precise
  Version table:
 *** 7~precise 0
       1001 http://ppa.launchpad.net/zfs-native/daily/ubuntu/ precise/main amd64 Packages
        100 /var/lib/dpkg/status

$strings /sbin/mountall | grep ZFS
%s: parsing ZFS list

$ grep ZFS_MOUNT /etc/default/zfs
ZFS_MOUNT='no'

nedbass commented 11 years ago

Here is a simplified test case. It does not matter if the datasets are empty or not.

root@precise-vm:~# truncate -s 128m /tmp/tank
root@precise-vm:~# zpool create tank /tmp/tank
root@precise-vm:~# zfs create tank/fish
root@precise-vm:~# zfs umount -a
root@precise-vm:~# zfs create -o mountpoint=/tank/fish tank/frog
root@precise-vm:~# zfs umount -a
root@precise-vm:~# mountall --verbose
/ is local
/proc is virtual
/sys is virtual
/sys/fs/fuse/connections is virtual
/sys/kernel/debug is virtual
/sys/kernel/security is virtual
/dev is virtual
/dev/pts is virtual
/tank/fish is virtual
/tank/fish is virtual
/var/lib/lightdm/.gvfs is local
mounting event sent for /tmp
mounting event sent for /tank
mounting event sent for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounted event handled for /
mount / [3993] exited normally
mount /proc [4008] exited normally
mount /sys [4014] exited normally
mount /sys/fs/fuse/connections [4018] exited normally
mount /sys/kernel/debug [4019] exited normally
mount /sys/kernel/security [4020] exited normally
mount /dev [4022] exited normally
mount /dev/pts [4025] exited normally
mount /run [4026] exited normally
mount /run/lock [4028] exited normally
mount /run/shm [4031] exited normally
/bin/sh: 1: gvfs-fuse-daemon: not found
mountall: mount /var/lib/lightdm/.gvfs [4034] terminated with status 127
local 1/3 remote 0/0 virtual 0/13 swap 0/1
mounted event handled for /proc
local 1/3 remote 0/0 virtual 1/13 swap 0/1
mounted event handled for /sys
local 1/3 remote 0/0 virtual 2/13 swap 0/1
mounted event handled for /sys/fs/fuse/connections
local 1/3 remote 0/0 virtual 3/13 swap 0/1
mounted event handled for /sys/kernel/debug
local 1/3 remote 0/0 virtual 4/13 swap 0/1
mounted event handled for /sys/kernel/security
local 1/3 remote 0/0 virtual 5/13 swap 0/1
mounted event handled for /dev/pts
local 1/3 remote 0/0 virtual 6/13 swap 0/1
mounted event handled for /run/lock
local 1/3 remote 0/0 virtual 7/13 swap 0/1
mounted event handled for /run/shm
local 1/3 remote 0/0 virtual 8/13 swap 0/1
mounted event handled for /var/lib/lightdm/.gvfs
local 2/3 remote 0/0 virtual 8/13 swap 0/1
mounted event handled for /dev
local 2/3 remote 0/0 virtual 9/13 swap 0/1
mounting event handled for /tmp
mounting event handled for /tank
mounting /tank
mount /tank [4048] exited normally
mounting event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
activating /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
swapon: /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683 [4109] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounted event handled for /run
local 2/3 remote 0/0 virtual 10/13 swap 0/1
mounted event handled for /tmp
local 3/3 remote 0/0 virtual 10/13 swap 0/1
mounted event handled for /tank
local 3/3 remote 0/0 virtual 11/13 swap 0/1
mounted event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
swap finished
local 3/3 remote 0/0 virtual 11/13 swap 1/1
mounting event sent for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounting event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
[hangs]

dajhorn commented 11 years ago

@nedbass

I can reproduce this, but I noticed that the second zfs umount -a call fails silently. If I manually dismount all ZFS datasets and run mountall again, then I get a slightly different failure:

try_mount: UUID=a5566bef-a07d-4720-a045-ec2d935cfbcc waiting for device
try_mount: /tank/fish waiting for /tank/fish
try_mount: /tank/fish waiting for /tank/fish
mounted event handled for /tmp
local 4/4 remote 0/0 virtual 14/22 swap 0/1
fsck_update: updating check priorities
plymouth_connect: Failed to connect to Plymouth: Connection refused
try_mount: UUID=a5566bef-a07d-4720-a045-ec2d935cfbcc waiting for device
try_mount: /tank/fish waiting for /tank/fish
try_mount: /tank/fish waiting for /tank/fish

dajhorn commented 11 years ago

The bug is caused by try_mounts() getting an incorrect dep->mounted value, which seems to be caused by mount_policy using the /tank/fish directory as the parent for both datasets instead of just /tank.

nedbass commented 11 years ago

This patch avoids the bug for me by eliminating duplicates from the mounts list. parse_fstab() does something similar.

diff --git a/src/mountall.c b/src/mountall.c
index a23f407..e97d97a 100644
--- a/src/mountall.c
+++ b/src/mountall.c
@@ -783,6 +783,7 @@ parse_zfs_list (void)
                char *saveptr;
                char *zfs_name, *zfs_mountpoint, *zfs_canmount, *zfs_optatime, *zfs_optronly;
                nih_local char *zfs_mntoptions = NULL;
+               Mount *         mnt;

                /* If the line didn't fit, then enlarge the buffer and retry. */
                while ((! strchr (buf, '\n')) && (! feof (zfs_stream))) {
@@ -824,7 +825,14 @@ parse_zfs_list (void)
                        NIH_MUST (nih_strcat (&zfs_mntoptions, NULL, ",noatime"));
                }

-               new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs", zfs_mntoptions);
+               mnt = find_mount (zfs_mountpoint);
+               if (mnt) {
+                       update_mount (mnt, zfs_name, FALSE, "zfs",
+                                     zfs_mntoptions);
+               } else {
+                       new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs",
+                                  zfs_mntoptions);
+               }
        }

        zfs_result = pclose (zfs_stream);

dajhorn commented 11 years ago

@nedbass, perhaps you found an upstream bug here. With ZFS disabled, I get an incomplete result from mountall if this is in the /etc/fstab file:

/var/tmp/alfa   /mnt/charlie    ext2    loop    0   0
/var/tmp/bravo  /mnt/charlie    ext2    loop    0   0

But a regular mount -a behaves as expected.

nedbass commented 11 years ago

Strange, I performed an equivalent test and didn't have a problem. Maybe we tested different versions. Based on my understanding of the bug, this code in parse_fstab should prevent the bug for duplicate mount points in the fstab file. It ensures only one of the entries exists in the mounts list when mount_policy is called. Do you know if the version you tested contains that logic?

dajhorn commented 11 years ago

Strange, I performed an equivalent test and didn't have a problem.

Lexical order and fstab order seems to affect the result.

Do you know if the version you tested contains that logic?

I tried the current releases for Precise and Quantal.

I get mountall behavior that is the same as mount -a and zfs mount -a with this patch:

--- a/src/mountall.c
+++ b/src/mountall.c
@@ -146,6 +146,7 @@
 };

 #define MOUNT_NAME(_mnt) (strcmp ((_mnt)->type, "swap")            \
+             && strcmp ((_mnt)->type, "zfs")       \
              && strcmp ((_mnt)->mountpoint, "none")    \
              ? (_mnt)->mountpoint : (_mnt)->device)

@@ -1131,6 +1132,17 @@
    nih_assert (root != NULL);
    nih_assert (path != NULL);

+   /*
+    * A path cannot be a root of itself.
+    *
+    * Without this test, `mountall` does not preserve `mount -a` behavior for
+    * fstab entries that share a mount point, and hangs on ZFS datasets and
+    * other virtual filesystems that are not backed by a real block device.
+    */
+
+   if (! strcmp (root, path))
+       return FALSE;
+
    len = strlen (root);
    if ((! strncmp (path, root, len))
        && ((path[len] == '\0')

nedbass commented 11 years ago

It is unclear to me what the original intention of that (path[len] == '\0') test was, but it seems to be there to specifically allow a path to be its own parent. With your proposed change, it may as well be removed.

dajhorn commented 11 years ago

@nedbass, I used the concise alternative that you suggested. After checking the code history and pondering the issue, I can't think of a circumstance where the existing mountall behavior is beneficial or necessary. Regardless, I do think that bug compatibility with util-linux and other platforms is worthwhile.

This is also a complete solution for the hang that @mk01 found earlier, so I dropped the solves-dependencies-problem-endless-loop.patch from the queue.

mk01 commented 11 years ago

@dajhorn when you plane to put the package into prod? I put mountall on hold for updates after it was able to mount my machines, now it would be the time to release it again.

And please, I was out for some time not having time to follow the grub 1.99 story as well and is on update also. Is it safe to auto update (I upgraded the pools to 5000, but kept /boot with co compression as we discussed last time).

dajhorn commented 11 years ago

@mk01, today I will promote the mountall packages from the daily PPA to the stable PPA.

ZoL root systems that want to upgrade from Precise to Raring should wait until several weeks after the Raring release.

mk01 commented 11 years ago

@dajhorn upgraded today. endless loop went back to 2.36.4-zfs1

dajhorn commented 11 years ago

@mk01, okay, unfortunate, but thanks for the update.

FYI, my bench configuration had the form:

# zfs list -o name,mountpoint
NAME                 MOUNTPOINT
tank/alfa            /tank/alfa
tank/blue            /tank/green
tank/bravo           /tank/green/bravo 
tank/charlie         /charlie
tank/charlie/brown   /charlie/brown
tank/charlie/cooper  /charlie/cooper
tank/delta1          /tank/delta
tank/delta2          /tank/delta
tank/delta3          /tank/delta
tank/yellow          /tank/green

With ext4 and tmpfs mounts on and under /charlie through the fstab. I tested for:

Interleaving with real mounts.
Interleaving with pseudo mounts.
Stacking inherited mountpoints.
Stacking explicit mountpoints.
Adding more mounts underneath each of those.

mk01 commented 11 years ago

fstab:

proc            /proc           proc    nodev,noexec,nosuid 0       0
z4t/var/log /var/log    zfs rw,noatime  0   0
z4t/var/cache   /var/cache  zfs rw,noatime  0   0
z4t/var/lib /var/lib    zfs rw,noatime  0   0

/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines /var/lib/named/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines   none    wait,bind   0   0
/home/xbmc  /exportnfs/xbmc     none    nobootwait,bind 0   0
/z4t/media  /exportnfs/media    none    nobootwait,bind 0   0
/z4t/Public /exportnfs/Public   none    nobootwait,bind 0   0
/z4t/Public /mnt/Public     none    nobootwait,bind 0   0
/home/traktor   /exportnfs/traktor  none    nobootwait,bind,ro  0   0
/home/usr_data  /exportnfs/usr_data none    nobootwait,bind,ro  0   0
/home/nfs_root  /exportnfs/nfs_root none    nobootwait,bind 0   0
/home/xbian_distro  /exportnfs/xbian_distro none    nobootwait,bind 0   0
/z4t/src    /exportnfs/src  none    nobootwait,bind 0   0

zfslist (is name, mountpoint, canmount)

z4t /z4t    on
z4t/Public  /z4t/Public on
z4t/ROOT    /z4t/ROOT   on
z4t/ROOT/boot   /boot   noauto
z4t/ROOT/ubuntu-UEFI    /   noauto
z4t/b   /z4t/b  off
z4t/b/ivka57    /z4t/b/ivka57   off
z4t/b/ivka57/Users  /zfs-pool-01    off
z4t/b/ivka57/Users/Library  /zfs-pool-01/Library    off
z4t/b/ivka57/Users/fast /zfs-pool-01/fast   off
z4t/b/ivka57/Users/fast/traktor /home/traktor   on
z4t/b/ivka57/Users/homesync /zfs-pool-01/homesync   on
z4t/b/ivka57/Users/homesync/matuskral   /zfs-pool-01/homesync/matuskral off
z4t/b/ivka57/para   /z4t/b/ivka57/para  off
z4t/b/media /z4t/b/media    off
z4t/b/media/zssd    /z4t/b/media/zssd   off
z4t/b/media/zssd/ROOT   /z4t/b/media/zssd/ROOT  off
z4t/b/media/zssd/ROOT/boot  /z4t/b/media/zssd/ROOT/boot off
z4t/b/media/zssd/ROOT/ubuntu-UEFI   /z4t/b/media/zssd/ROOT/ubuntu-UEFI  off
z4t/b/sntmsvtm  /z4t/b/sntmsvtm off
z4t/b/sntmsvtm/usr_data /home/usr_data/ on
z4t/b/sntmsvtm/usr_data/Library /home/usr_data//Library off
z4t/b/sntmsvtm/usr_data/Library/CachingData /home/usr_data//Library/CachingData off
z4t/b/sntmsvtm/usr_data/test    /home/usr_data//test    off
z4t/b/sntmsvtt  /z4t/b/sntmsvtt off
z4t/b/sntmsvtt/zfs  /z4t/b/sntmsvtt/zfs off
z4t/b/sntmsvtt/zfs/ROOT /z4t/b/sntmsvtt/zfs/ROOT    off
z4t/b/sntmsvtt/zfs/ROOT/ubuntu-1    /z4t/b/sntmsvtt/zfs/ROOT/ubuntu-1   off
z4t/exportnfs   /exportnfs  on
z4t/home    /home   on
z4t/home/nfs_root   /home/nfs_root  on
z4t/home/xbian_distro   /home/xbian_distro  on
z4t/home/xbmc   /home/xbmc  on
z4t/media   /z4t/media  on
z4t/mysqldb /z4t/mysqldb    on
z4t/squid   /z4t/squid  on
z4t/src /z4t/src    on
z4t/test    /z4t/test   on
z4t/test/pokus  /pokus  on
z4t/test_zfs    /z4t/test_zfs   on
z4t/var /z4t/var    on
z4t/var/cache   legacy  on
z4t/var/lib legacy  on
z4t/var/log legacy  on
z4t/vm  /z4t/vm on
z4t/vm/vm   /z4t/vm/vm  on
z4t/vm/xbian_alpha5 /z4t/vm/xbian_alpha5    on
z4t/zvol    /z4t/zvol   on
zssd    /zssd   on
zssd/ROOT   /zssd/ROOT  on
zssd/ROOT/boot  /boot   on
zssd/ROOT/ubuntu-UEFI   /   on

dajhorn commented 11 years ago

@mk01, I created a skeleton of this filesystem layout in a virtual machine and tried to reproduce the failure. I noticed this:

# zfs list -Ho mountpoint | sort | uniq -d
/
/boot
legacy

Can you isolate for the doubly-mounted root filesystem on real hardware? This could be the corner-case.

(I didn't think to check for / earlier because older Ubuntu releases would core themselves if the user touched it.)

mk01 commented 9 years ago

@dajhorn

those doubled / and /boot were (at that time) old boot and rootfs - but set "noauto". the actual problematic records are those bind mounts into /exports.

zfsonlinux / mountall

mountall hangs if datasets have duplicate mountpoint properties #5