Open nedbass opened 11 years ago
Here is a simplified test case. It does not matter if the datasets are empty or not.
root@precise-vm:~# truncate -s 128m /tmp/tank
root@precise-vm:~# zpool create tank /tmp/tank
root@precise-vm:~# zfs create tank/fish
root@precise-vm:~# zfs umount -a
root@precise-vm:~# zfs create -o mountpoint=/tank/fish tank/frog
root@precise-vm:~# zfs umount -a
root@precise-vm:~# mountall --verbose
/ is local
/proc is virtual
/sys is virtual
/sys/fs/fuse/connections is virtual
/sys/kernel/debug is virtual
/sys/kernel/security is virtual
/dev is virtual
/dev/pts is virtual
/tank/fish is virtual
/tank/fish is virtual
/var/lib/lightdm/.gvfs is local
mounting event sent for /tmp
mounting event sent for /tank
mounting event sent for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounted event handled for /
mount / [3993] exited normally
mount /proc [4008] exited normally
mount /sys [4014] exited normally
mount /sys/fs/fuse/connections [4018] exited normally
mount /sys/kernel/debug [4019] exited normally
mount /sys/kernel/security [4020] exited normally
mount /dev [4022] exited normally
mount /dev/pts [4025] exited normally
mount /run [4026] exited normally
mount /run/lock [4028] exited normally
mount /run/shm [4031] exited normally
/bin/sh: 1: gvfs-fuse-daemon: not found
mountall: mount /var/lib/lightdm/.gvfs [4034] terminated with status 127
local 1/3 remote 0/0 virtual 0/13 swap 0/1
mounted event handled for /proc
local 1/3 remote 0/0 virtual 1/13 swap 0/1
mounted event handled for /sys
local 1/3 remote 0/0 virtual 2/13 swap 0/1
mounted event handled for /sys/fs/fuse/connections
local 1/3 remote 0/0 virtual 3/13 swap 0/1
mounted event handled for /sys/kernel/debug
local 1/3 remote 0/0 virtual 4/13 swap 0/1
mounted event handled for /sys/kernel/security
local 1/3 remote 0/0 virtual 5/13 swap 0/1
mounted event handled for /dev/pts
local 1/3 remote 0/0 virtual 6/13 swap 0/1
mounted event handled for /run/lock
local 1/3 remote 0/0 virtual 7/13 swap 0/1
mounted event handled for /run/shm
local 1/3 remote 0/0 virtual 8/13 swap 0/1
mounted event handled for /var/lib/lightdm/.gvfs
local 2/3 remote 0/0 virtual 8/13 swap 0/1
mounted event handled for /dev
local 2/3 remote 0/0 virtual 9/13 swap 0/1
mounting event handled for /tmp
mounting event handled for /tank
mounting /tank
mount /tank [4048] exited normally
mounting event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
activating /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
swapon: /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683: swapon failed: Device or resource busy
mountall: swapon /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683 [4109] terminated with status 255
mountall: Problem activating swap: /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounted event handled for /run
local 2/3 remote 0/0 virtual 10/13 swap 0/1
mounted event handled for /tmp
local 3/3 remote 0/0 virtual 10/13 swap 0/1
mounted event handled for /tank
local 3/3 remote 0/0 virtual 11/13 swap 0/1
mounted event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
swap finished
local 3/3 remote 0/0 virtual 11/13 swap 1/1
mounting event sent for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
mounting event handled for swap /dev/disk/by-uuid/34efd22b-4736-46e0-a2f1-60e693317683
[hangs]
@nedbass
I can reproduce this, but I noticed that the second zfs umount -a
call fails silently. If I manually dismount all ZFS datasets and run mountall again, then I get a slightly different failure:
try_mount: UUID=a5566bef-a07d-4720-a045-ec2d935cfbcc waiting for device
try_mount: /tank/fish waiting for /tank/fish
try_mount: /tank/fish waiting for /tank/fish
mounted event handled for /tmp
local 4/4 remote 0/0 virtual 14/22 swap 0/1
fsck_update: updating check priorities
plymouth_connect: Failed to connect to Plymouth: Connection refused
try_mount: UUID=a5566bef-a07d-4720-a045-ec2d935cfbcc waiting for device
try_mount: /tank/fish waiting for /tank/fish
try_mount: /tank/fish waiting for /tank/fish
The bug is caused by try_mounts()
getting an incorrect dep->mounted
value, which seems to be caused by mount_policy
using the /tank/fish
directory as the parent for both datasets instead of just /tank
.
This patch avoids the bug for me by eliminating duplicates from the mounts list. parse_fstab()
does something similar.
diff --git a/src/mountall.c b/src/mountall.c
index a23f407..e97d97a 100644
--- a/src/mountall.c
+++ b/src/mountall.c
@@ -783,6 +783,7 @@ parse_zfs_list (void)
char *saveptr;
char *zfs_name, *zfs_mountpoint, *zfs_canmount, *zfs_optatime, *zfs_optronly;
nih_local char *zfs_mntoptions = NULL;
+ Mount * mnt;
/* If the line didn't fit, then enlarge the buffer and retry. */
while ((! strchr (buf, '\n')) && (! feof (zfs_stream))) {
@@ -824,7 +825,14 @@ parse_zfs_list (void)
NIH_MUST (nih_strcat (&zfs_mntoptions, NULL, ",noatime"));
}
- new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs", zfs_mntoptions);
+ mnt = find_mount (zfs_mountpoint);
+ if (mnt) {
+ update_mount (mnt, zfs_name, FALSE, "zfs",
+ zfs_mntoptions);
+ } else {
+ new_mount (zfs_mountpoint, zfs_name, FALSE, "zfs",
+ zfs_mntoptions);
+ }
}
zfs_result = pclose (zfs_stream);
@nedbass, perhaps you found an upstream bug here. With ZFS disabled, I get an incomplete result from mountall
if this is in the /etc/fstab file:
/var/tmp/alfa /mnt/charlie ext2 loop 0 0
/var/tmp/bravo /mnt/charlie ext2 loop 0 0
But a regular mount -a
behaves as expected.
Strange, I performed an equivalent test and didn't have a problem. Maybe we tested different versions. Based on my understanding of the bug, this code in parse_fstab should prevent the bug for duplicate mount points in the fstab file. It ensures only one of the entries exists in the mounts list when mount_policy is called. Do you know if the version you tested contains that logic?
Strange, I performed an equivalent test and didn't have a problem.
Lexical order and fstab order seems to affect the result.
Do you know if the version you tested contains that logic?
I tried the current releases for Precise and Quantal.
I get mountall
behavior that is the same as mount -a
and zfs mount -a
with this patch:
--- a/src/mountall.c
+++ b/src/mountall.c
@@ -146,6 +146,7 @@
};
#define MOUNT_NAME(_mnt) (strcmp ((_mnt)->type, "swap") \
+ && strcmp ((_mnt)->type, "zfs") \
&& strcmp ((_mnt)->mountpoint, "none") \
? (_mnt)->mountpoint : (_mnt)->device)
@@ -1131,6 +1132,17 @@
nih_assert (root != NULL);
nih_assert (path != NULL);
+ /*
+ * A path cannot be a root of itself.
+ *
+ * Without this test, `mountall` does not preserve `mount -a` behavior for
+ * fstab entries that share a mount point, and hangs on ZFS datasets and
+ * other virtual filesystems that are not backed by a real block device.
+ */
+
+ if (! strcmp (root, path))
+ return FALSE;
+
len = strlen (root);
if ((! strncmp (path, root, len))
&& ((path[len] == '\0')
It is unclear to me what the original intention of that (path[len] == '\0')
test was, but it seems to be there to specifically allow a path to be its own parent. With your proposed change, it may as well be removed.
@nedbass, I used the concise alternative that you suggested. After checking the code history and pondering the issue, I can't think of a circumstance where the existing mountall
behavior is beneficial or necessary. Regardless, I do think that bug compatibility with util-linux
and other platforms is worthwhile.
This is also a complete solution for the hang that @mk01 found earlier, so I dropped the solves-dependencies-problem-endless-loop.patch
from the queue.
@dajhorn when you plane to put the package into prod? I put mountall on hold for updates after it was able to mount my machines, now it would be the time to release it again.
And please, I was out for some time not having time to follow the grub 1.99 story as well and is on update also. Is it safe to auto update (I upgraded the pools to 5000, but kept /boot with co compression as we discussed last time).
@mk01, today I will promote the mountall packages from the daily PPA to the stable PPA.
ZoL root systems that want to upgrade from Precise to Raring should wait until several weeks after the Raring release.
@dajhorn upgraded today. endless loop went back to 2.36.4-zfs1
@mk01, okay, unfortunate, but thanks for the update.
FYI, my bench configuration had the form:
# zfs list -o name,mountpoint
NAME MOUNTPOINT
tank/alfa /tank/alfa
tank/blue /tank/green
tank/bravo /tank/green/bravo
tank/charlie /charlie
tank/charlie/brown /charlie/brown
tank/charlie/cooper /charlie/cooper
tank/delta1 /tank/delta
tank/delta2 /tank/delta
tank/delta3 /tank/delta
tank/yellow /tank/green
With ext4 and tmpfs mounts on and under /charlie
through the fstab. I tested for:
fstab:
proc /proc proc nodev,noexec,nosuid 0 0
z4t/var/log /var/log zfs rw,noatime 0 0
z4t/var/cache /var/cache zfs rw,noatime 0 0
z4t/var/lib /var/lib zfs rw,noatime 0 0
/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines /var/lib/named/usr/lib/x86_64-linux-gnu/openssl-1.0.0/engines none wait,bind 0 0
/home/xbmc /exportnfs/xbmc none nobootwait,bind 0 0
/z4t/media /exportnfs/media none nobootwait,bind 0 0
/z4t/Public /exportnfs/Public none nobootwait,bind 0 0
/z4t/Public /mnt/Public none nobootwait,bind 0 0
/home/traktor /exportnfs/traktor none nobootwait,bind,ro 0 0
/home/usr_data /exportnfs/usr_data none nobootwait,bind,ro 0 0
/home/nfs_root /exportnfs/nfs_root none nobootwait,bind 0 0
/home/xbian_distro /exportnfs/xbian_distro none nobootwait,bind 0 0
/z4t/src /exportnfs/src none nobootwait,bind 0 0
zfslist (is name, mountpoint, canmount)
z4t /z4t on
z4t/Public /z4t/Public on
z4t/ROOT /z4t/ROOT on
z4t/ROOT/boot /boot noauto
z4t/ROOT/ubuntu-UEFI / noauto
z4t/b /z4t/b off
z4t/b/ivka57 /z4t/b/ivka57 off
z4t/b/ivka57/Users /zfs-pool-01 off
z4t/b/ivka57/Users/Library /zfs-pool-01/Library off
z4t/b/ivka57/Users/fast /zfs-pool-01/fast off
z4t/b/ivka57/Users/fast/traktor /home/traktor on
z4t/b/ivka57/Users/homesync /zfs-pool-01/homesync on
z4t/b/ivka57/Users/homesync/matuskral /zfs-pool-01/homesync/matuskral off
z4t/b/ivka57/para /z4t/b/ivka57/para off
z4t/b/media /z4t/b/media off
z4t/b/media/zssd /z4t/b/media/zssd off
z4t/b/media/zssd/ROOT /z4t/b/media/zssd/ROOT off
z4t/b/media/zssd/ROOT/boot /z4t/b/media/zssd/ROOT/boot off
z4t/b/media/zssd/ROOT/ubuntu-UEFI /z4t/b/media/zssd/ROOT/ubuntu-UEFI off
z4t/b/sntmsvtm /z4t/b/sntmsvtm off
z4t/b/sntmsvtm/usr_data /home/usr_data/ on
z4t/b/sntmsvtm/usr_data/Library /home/usr_data//Library off
z4t/b/sntmsvtm/usr_data/Library/CachingData /home/usr_data//Library/CachingData off
z4t/b/sntmsvtm/usr_data/test /home/usr_data//test off
z4t/b/sntmsvtt /z4t/b/sntmsvtt off
z4t/b/sntmsvtt/zfs /z4t/b/sntmsvtt/zfs off
z4t/b/sntmsvtt/zfs/ROOT /z4t/b/sntmsvtt/zfs/ROOT off
z4t/b/sntmsvtt/zfs/ROOT/ubuntu-1 /z4t/b/sntmsvtt/zfs/ROOT/ubuntu-1 off
z4t/exportnfs /exportnfs on
z4t/home /home on
z4t/home/nfs_root /home/nfs_root on
z4t/home/xbian_distro /home/xbian_distro on
z4t/home/xbmc /home/xbmc on
z4t/media /z4t/media on
z4t/mysqldb /z4t/mysqldb on
z4t/squid /z4t/squid on
z4t/src /z4t/src on
z4t/test /z4t/test on
z4t/test/pokus /pokus on
z4t/test_zfs /z4t/test_zfs on
z4t/var /z4t/var on
z4t/var/cache legacy on
z4t/var/lib legacy on
z4t/var/log legacy on
z4t/vm /z4t/vm on
z4t/vm/vm /z4t/vm/vm on
z4t/vm/xbian_alpha5 /z4t/vm/xbian_alpha5 on
z4t/zvol /z4t/zvol on
zssd /zssd on
zssd/ROOT /zssd/ROOT on
zssd/ROOT/boot /boot on
zssd/ROOT/ubuntu-UEFI / on
@mk01, I created a skeleton of this filesystem layout in a virtual machine and tried to reproduce the failure. I noticed this:
# zfs list -Ho mountpoint | sort | uniq -d
/
/boot
legacy
Can you isolate for the doubly-mounted root filesystem on real hardware? This could be the corner-case.
(I didn't think to check for /
earlier because older Ubuntu releases would core themselves if the user touched it.)
@dajhorn
those doubled / and /boot were (at that time) old boot and rootfs - but set "noauto". the actual problematic records are those bind mounts into /exports.
I accidentally got my workstation into a state where the boot would hang after the "Running init-bottom...done". The cause turned out to be two datasets that had identical mountpoint properties. This seemed to prevent mountall from mounting either dataset and it would never emit the virtual-filesystems event, blocking udev and other services. If I start a shell on tty2 and mount one of the filesystems manually, mountall completes and the boot continues normally.
I need to verify this, but I think the first dataset with the duplicate mountpoint to show up in
zfs list
must be non-empty to reproduce this issue. However, as another test, I created a dataset with a non-empty underlying mountpoint, but this did not cause a boot problem. So the duplicate mountpoint properties seem to cause an issue beyond just the mountpoint being non-empty.For example, this reproduces the issue:
This does not reproduce it:
This reproducer is admittedly a configuration error (I introduced it by receiving a replication stream into a test pool) but I thought we should evaluate the impact and how it could be handled better. I could collect more debug data, but I wanted to start by just reporting what happened since @dajhorn may have some insight about it.