Open TerraTech opened 4 years ago
Can this please be tagged as a defect?
I can reproduce this reliably with LXD. See https://github.com/lxc/lxd/issues/7854
I entered a new bug with the LXD steps: https://github.com/openzfs/zfs/issues/10935
I am also seeing this using LXD. The interesting thing is that using lxc copy
works, and should be over ZFS, So I am bisecting which flags may affect this. It may be that it only succeeds by falling back on rsync. I will report back.
LXD does this by:
zfs send -c -L | zfs recv -x mountpoint -F -u # first snap
zfs send -c -L -i | zfs recv -x mountpoint -F -u # for each next snap
It is the -R
flag that causes the issue. The manual states "clones are preserved" so presumably this includes parent clone relations.
I imagine the use case for -R
often is backup or migration, for which -R
is clearly overkill. A backup scenario typically needs only the -I
equivalent for incremental send but without the source snapshot, i.e. "send all snapshots up until this one".
As it stands, doing it the LXD way without -R
and doing at least one -I
pass afterwards is a doable workaround but not as elegant or convenient.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
not stale and should not be closed
important issue
send -R
is too big. it's basically an anti-pattern, it does too much. you'll find that most scripts who manage replication will loop over filesystems and run recv -o origin=foo/bar
instead of allowing -R
to turn your system into a bit-blender by wrongly linking origin
to a similar filesystem in the wrong path etc.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
not stale and should not be closed
I'm hitting this too, attempting to migrate a 2TiB zpool from FreeBSD 15.0-CURRENT af0d437 to 14.0-RELEASE.
The above workaround isn't suitable in this case, as there are many jails and doing this "manually" will not fit on the target zpool, too many cloned jails present.
Is it possible to pre-seed the origins somehow, such that the recv still works, and then clean up afterwards?
System information
Describe the problem you're observing
I was performing a dataset migration from one pool to another and encountered the following error:
cannot receive: local origin for clone zram/zssd/ztest/lxd/containers/daemons_vpn-terra-tek@send2zscratch does not exist
I checked the receive log and saw that the origin clone had in fact been received earlier in the stream. At this point, I rolled up my sleeves and dove in by adding debugging emitters to track its flow.
The debugging code can be found at: https://github.com/TerraTech/zfs/tree/debug_send_receive https://github.com/TerraTech/zfs/commit/ea08910e9d687aed983bdb17ed90a61aab79abbf
The receive log (where it fails) can be found here: https://gist.github.com/TerraTech/99f92446825490fd7f08cce330b8a0f1
I added two notes in the relevant regions, search for:
### 1 ###
and### 2 ###
I finally tracked down the bug to a transversal optimization here: https://github.com/openzfs/zfs/blob/9bb3d57b03e6916a2d38574420a2934b8827b3fb/lib/libzfs/libzfs_sendrecv.c#L2703 in commit: https://github.com/openzfs/zfs/commit/47dfff3b86c67c6ae184c2b7166eaa529590c2d2
To workaround the problem, I added an override (FQ_OVERRIDE_GTND) that tested for: gtnd->skip == "containers" and allowed the transversal to proceed. https://github.com/TerraTech/zfs/blob/ea08910e9d687aed983bdb17ed90a61aab79abbf/lib/libzfs/libzfs_sendrecv.c#L2667
When doing the migration with the override in effect, I was able to complete the task successfully. The logfile for successful migration can be found at: https://gist.github.com/TerraTech/a942a3c2654f7645e85ccf4740f5113c
Describe how to reproduce the problem
1) Retrieve by zfs debugging branch $ git clone -b debug_send_receive https://github.com/TerraTech/zfs 2) build and use wrapper scripts in ./bin/zfs 3) Create the zfs test tree that will trigger the bug. $ ./zcreate_testtree_
** using: 'zssd' will create
zssd/ztest
zssd/ztest/lxd
...
zssd/ztest/lxd/snapshots
4) Perform the zfs send receive, and manipulate the FQ_OVERRIDE_GTND envariable
This will fail. $ ./bin/zfs send -R -L -c -e -p @send2zscratch | FQ_OVERRIDEGTND=0 ./bin/zfs receive -v -u -ocanmount=noauto _/zssd
This will succeed. $ ./bin/zfs send -R -L -c -e -p @send2zscratch | FQ_OVERRIDEGTND=1 ./bin/zfs receive -v -u -ocanmount=noauto _/zssd
In conclusion, I am not sure what the best approach is to fix the transversal optimization, other than possibly using the full path instead of the last path component.
Include any warning/errors/backtraces from the system logs
N/A - operational logs are above via gist links.