Open albertmichaelj opened 5 years ago
I could reproduce your issue (I have the same setup as you) but this isn't an issue with syncoid, it's a zfs thing, i searched and found a comment explaining the problem: https://github.com/zfsonlinux/zfs/issues/4811#issuecomment-229082982
Thanks for your explanation. However, I don't feel that this issue should be closed so quickly. There is clearly a solution (running the remount operation or, based on my reading of the comment, unmounting the target data set before doing the receive operation). Why can't syncoid be updated to implement one of the two workarounds for the ZFS bug? I suspect that the better implementation would be unmounting the target dataset before replication and then remounting at the end of the syncoid run (seems to be recommended based on the comment). Do you anticipate that users will be actively working in the target dataset during replication? That seems like a really stupid thing to do since none of the changes in the target dataset will survive replication.
If there is a good reason, I'm fine with my manual workaround, but it's not expected behavior and will likely continue to be confusing for users.
Yeah sure, a workaround can be implemented. But in my view unmounting before replication is not what the user expects (replication target can still be used for read only stuff. So remounting in the correct order seems to be the better option. But I think this will be tricky, one has to check for descendant mountpoints properly. Also there are some edge cases like if syncoid is only instructed to sync one dataset, should it nevertheless remount all the descendant datasets?
Maybe you want to take a shoot at this and prepare a PR?
Thanks for reopening this. Like I said in my first post, I'm pretty new to ZFS, and I've never coded in perl before (I'm a python guy). So, I don't really feel equipped to directly fixing this (at least not "the right way," I could probably force remount the file system...).
I do think that having this as an enhancement is the right thing, though. I wish I could be of more help!
Ouch, I've spent few hours debugging this, finally found out just the pool are unmounted like this issue.
Here's an auto-consistent script to test it:
#!/bin/sh set -ex if [ ! -x /tmp/syncoid-test ]; then wget -O /tmp/syncoid-test "https://github.com/jimsalterjrs/sanoid/raw/master/syncoid" chmod +x /tmp/syncoid-test fi zfs destroy -r FAKEprod || true zfs destroy -r FAKEdev || true zpool destroy -f FAKEprod || true zpool destroy -f FAKEdev || true rm -fv /tmp/zfs-*img dd if=/dev/zero of=/tmp/zfs-prod.img bs=1024 count=65536 dd if=/dev/zero of=/tmp/zfs-dev.img bs=1024 count=65536 zpool create -O atime=off FAKEprod /tmp/zfs-prod.img zpool create -O atime=off FAKEdev /tmp/zfs-dev.img zfs create FAKEprod/data zfs create FAKEprod/data/sub echo 1 > /FAKEprod/stuff.txt echo 1 > /FAKEprod/data/stuff.txt echo 1 > /FAKEprod/data/sub/stuff.txt zfs snapshot FAKEprod@s1 zfs snapshot FAKEprod/data@s1 zfs snapshot FAKEprod/data/sub@s1 /tmp/syncoid-test --recursive --no-sync-snap FAKEprod/data FAKEdev/prodbackup find /FAKEdev/ echo 2 >> /FAKEprod/stuff.txt echo 2 >> /FAKEprod/data/stuff.txt echo 2 >> /FAKEprod/data/sub/stuff.txt zfs snapshot FAKEprod@s2 zfs snapshot FAKEprod/data@s2 zfs snapshot FAKEprod/data/sub@s2 /tmp/syncoid-test --recursive --no-sync-snap FAKEprod/data FAKEdev/prodbackup #fix# zfs list -rH -o name FAKEdev/prodbackup | xargs -L 1 zfs mount find /FAKEdev/ zfs list -o name,mounted
+ find /FAKEdev/
/FAKEdev/
/FAKEdev/prodbackup
/FAKEdev/prodbackup/sub
-/FAKEdev/prodbackup/sub/stuff.txt
/FAKEdev/prodbackup/stuff.txt
+ zfs list -o name,mounted
NAME MOUNTED
FAKEdev yes
FAKEdev/prodbackup yes
-FAKEdev/prodbackup/sub yes
+FAKEdev/prodbackup/sub no
FAKEprod yes
FAKEprod/data yes
FAKEprod/data/sub yes
It would be better to pinpoint this behavior in the doc
I am relatively new to ZFS on Linux (long time BTRFS user), but I am running into a weird bug. I don’t think it is user error, but feel free to tell me I’m wrong.
I’m using Ubuntu 18.04 with ZFS on root, using the standard ZFS packages provided by Ubuntu, and I’m using the instructions from here. I bring this up because some of the datasets created for ZFS on root are a little nonstandard, and that may be effecting this.
What’s happening is that I’m replicating my root dataset (pool) to a local array of hard disks (array) using Syncoid. I’m using the following command:
/usr/local/bin/syncoid -r --quiet --exclude=rpool/docker --exclude=rpool/lxd --exclude=rpool/var/tmp --exclude=rpool/swap --exclude=rpool/var/cache --exclude=rpool/home/michael/Temp rpool array/Root_Backup > /dev/null
If I reboot the system and then go to /mnt/array/Root_Backup (the place where the replicated dataset is supposed to be mounted), I see the full dataset just like I would expect to. However, if I run the above syncoid command, and then go to the mount point, nothing is there.
If I run
mount | grep zfs
, it looks like the datasets are mounted, as in I get the below:Which are the datasets that I’m replicating at the points that I think they are supposed to be. Yet, I can’t see them (or
cd
directly to them). In order to see the datasets, I have to run:Then all of the datasets appear as expected. So, it seems like something is happening during syncoid that messes up the remounting process. However, I am at the limits of my debugging skills with ZFS, though I’m happy to test whatever if given directions.
Any thoughts on what might be causing this?
Thanks!