implement workaround for shadowing descendant mountpoints after replication

albertmichaelj commented 5 years ago

I am relatively new to ZFS on Linux (long time BTRFS user), but I am running into a weird bug. I don’t think it is user error, but feel free to tell me I’m wrong.

I’m using Ubuntu 18.04 with ZFS on root, using the standard ZFS packages provided by Ubuntu, and I’m using the instructions from here. I bring this up because some of the datasets created for ZFS on root are a little nonstandard, and that may be effecting this.

What’s happening is that I’m replicating my root dataset (pool) to a local array of hard disks (array) using Syncoid. I’m using the following command:

/usr/local/bin/syncoid -r --quiet --exclude=rpool/docker --exclude=rpool/lxd --exclude=rpool/var/tmp --exclude=rpool/swap --exclude=rpool/var/cache --exclude=rpool/home/michael/Temp rpool array/Root_Backup > /dev/null

If I reboot the system and then go to /mnt/array/Root_Backup (the place where the replicated dataset is supposed to be mounted), I see the full dataset just like I would expect to. However, if I run the above syncoid command, and then go to the mount point, nothing is there.

If I run mount | grep zfs, it looks like the datasets are mounted, as in I get the below:

array/Root_Backup on /mnt/array/Root_Backup type zfs (rw,xattr,noacl)
array/Root_Backup/ROOT on /mnt/array/Root_Backup/ROOT type zfs (rw,xattr,noacl)
array/Root_Backup/ROOT/ubuntu on /mnt/array/Root_Backup/ROOT/ubuntu type zfs (rw,xattr,noacl)
array/Root_Backup/home on /mnt/array/Root_Backup/home type zfs (rw,xattr,noacl)
array/Root_Backup/home/michael on /mnt/array/Root_Backup/home/michael type zfs (rw,xattr,noacl)
array/Root_Backup/home/root on /mnt/array/Root_Backup/home/root type zfs (rw,xattr,noacl)
array/Root_Backup/images on /mnt/array/Root_Backup/images type zfs (rw,xattr,noacl)
array/Root_Backup/images/qemu on /mnt/array/Root_Backup/images/qemu type zfs (rw,xattr,noacl)
array/Root_Backup/srv on /mnt/array/Root_Backup/srv type zfs (rw,xattr,noacl)
array/Root_Backup/var on /mnt/array/Root_Backup/var type zfs (rw,xattr,noacl)
array/Root_Backup/var/cache on /mnt/array/Root_Backup/var/cache type zfs (rw,xattr,noacl)
array/Root_Backup/var/games on /mnt/array/Root_Backup/var/games type zfs (rw,xattr,noacl)
array/Root_Backup/var/log on /mnt/array/Root_Backup/var/log type zfs (rw,xattr,noacl)
array/Root_Backup/var/mail on /mnt/array/Root_Backup/var/mail type zfs (rw,xattr,noacl)
array/Root_Backup/var/nfs on /mnt/array/Root_Backup/var/nfs type zfs (rw,xattr,noacl)
array/Root_Backup/var/spool on /mnt/array/Root_Backup/var/spool type zfs (rw,xattr,noacl)

Which are the datasets that I’m replicating at the points that I think they are supposed to be. Yet, I can’t see them (or cd directly to them). In order to see the datasets, I have to run:

sudo sh -c "zfs list -rH -o name array/Root_Backup | xargs -L 1 zfs mount -o remount"

Then all of the datasets appear as expected. So, it seems like something is happening during syncoid that messes up the remounting process. However, I am at the limits of my debugging skills with ZFS, though I’m happy to test whatever if given directions.

Any thoughts on what might be causing this?

Thanks!

phreaker0 commented 5 years ago

I could reproduce your issue (I have the same setup as you) but this isn't an issue with syncoid, it's a zfs thing, i searched and found a comment explaining the problem: https://github.com/zfsonlinux/zfs/issues/4811#issuecomment-229082982

albertmichaelj commented 5 years ago

Thanks for your explanation. However, I don't feel that this issue should be closed so quickly. There is clearly a solution (running the remount operation or, based on my reading of the comment, unmounting the target data set before doing the receive operation). Why can't syncoid be updated to implement one of the two workarounds for the ZFS bug? I suspect that the better implementation would be unmounting the target dataset before replication and then remounting at the end of the syncoid run (seems to be recommended based on the comment). Do you anticipate that users will be actively working in the target dataset during replication? That seems like a really stupid thing to do since none of the changes in the target dataset will survive replication.

If there is a good reason, I'm fine with my manual workaround, but it's not expected behavior and will likely continue to be confusing for users.

phreaker0 commented 5 years ago

Yeah sure, a workaround can be implemented. But in my view unmounting before replication is not what the user expects (replication target can still be used for read only stuff. So remounting in the correct order seems to be the better option. But I think this will be tricky, one has to check for descendant mountpoints properly. Also there are some edge cases like if syncoid is only instructed to sync one dataset, should it nevertheless remount all the descendant datasets?

Maybe you want to take a shoot at this and prepare a PR?

albertmichaelj commented 5 years ago

Thanks for reopening this. Like I said in my first post, I'm pretty new to ZFS, and I've never coded in perl before (I'm a python guy). So, I don't really feel equipped to directly fixing this (at least not "the right way," I could probably force remount the file system...).

I do think that having this as an enhancement is the right thing, though. I wish I could be of more help!

Slamdunk commented 4 years ago

Ouch, I've spent few hours debugging this, finally found out just the pool are unmounted like this issue.

Here's an auto-consistent script to test it:

test-mount-after-syncoid.sh

#!/bin/sh

set -ex

if [ ! -x /tmp/syncoid-test ]; then
    wget -O /tmp/syncoid-test "https://github.com/jimsalterjrs/sanoid/raw/master/syncoid"
    chmod +x /tmp/syncoid-test
fi

zfs destroy -r FAKEprod || true
zfs destroy -r FAKEdev || true
zpool destroy -f FAKEprod || true
zpool destroy -f FAKEdev || true
rm -fv /tmp/zfs-*img

dd if=/dev/zero of=/tmp/zfs-prod.img bs=1024 count=65536
dd if=/dev/zero of=/tmp/zfs-dev.img bs=1024 count=65536

zpool create -O atime=off FAKEprod /tmp/zfs-prod.img
zpool create -O atime=off FAKEdev  /tmp/zfs-dev.img

zfs create FAKEprod/data
zfs create FAKEprod/data/sub

echo 1 > /FAKEprod/stuff.txt
echo 1 > /FAKEprod/data/stuff.txt
echo 1 > /FAKEprod/data/sub/stuff.txt

zfs snapshot FAKEprod@s1
zfs snapshot FAKEprod/data@s1
zfs snapshot FAKEprod/data/sub@s1

/tmp/syncoid-test --recursive --no-sync-snap FAKEprod/data FAKEdev/prodbackup

find /FAKEdev/

echo 2 >> /FAKEprod/stuff.txt
echo 2 >> /FAKEprod/data/stuff.txt
echo 2 >> /FAKEprod/data/sub/stuff.txt

zfs snapshot FAKEprod@s2
zfs snapshot FAKEprod/data@s2
zfs snapshot FAKEprod/data/sub@s2

/tmp/syncoid-test --recursive --no-sync-snap FAKEprod/data FAKEdev/prodbackup

#fix# zfs list -rH -o name FAKEdev/prodbackup | xargs -L 1 zfs mount

find /FAKEdev/
zfs list -o name,mounted

Expected vs Actual output (last 2 command lines)

 + find /FAKEdev/
 /FAKEdev/
 /FAKEdev/prodbackup
 /FAKEdev/prodbackup/sub
-/FAKEdev/prodbackup/sub/stuff.txt
 /FAKEdev/prodbackup/stuff.txt
 + zfs list -o name,mounted
 NAME                    MOUNTED
 FAKEdev                     yes
 FAKEdev/prodbackup          yes
-FAKEdev/prodbackup/sub      yes
+FAKEdev/prodbackup/sub       no
 FAKEprod                    yes
 FAKEprod/data               yes
 FAKEprod/data/sub           yes

It would be better to pinpoint this behavior in the doc

jimsalterjrs / sanoid

implement workaround for shadowing descendant mountpoints after replication #325

Expected vs Actual output (last 2 command lines)