Obscure bug with zfs receive and clones

TerraTech commented 4 years ago

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	default/linux/amd64/17.0/hardened
Linux Kernel	4.19.97
Architecture	amd64
ZFS Version	0.8.3-r0-gentoo
SPL Version	0.8.3-r0-gentoo

Describe the problem you're observing

I was performing a dataset migration from one pool to another and encountered the following error: cannot receive: local origin for clone zram/zssd/ztest/lxd/containers/daemons_vpn-terra-tek@send2zscratch does not exist

Local origin:
$ zfs list -oorigin zssd/zslice/lxd/containers/daemons_vpn-terra-tek
ORIGIN
zssd/zslice/lxd/deleted/containers/fe89e83a-936b-49dd-8631-8b72a57951e4@copy-4e6d38b2-512a-4973-a198-de2da8ccf46d

I checked the receive log and saw that the origin clone had in fact been received earlier in the stream. At this point, I rolled up my sleeves and dove in by adding debugging emitters to track its flow.

The debugging code can be found at: https://github.com/TerraTech/zfs/tree/debug_send_receive https://github.com/TerraTech/zfs/commit/ea08910e9d687aed983bdb17ed90a61aab79abbf

The receive log (where it fails) can be found here: https://gist.github.com/TerraTech/99f92446825490fd7f08cce330b8a0f1

I added two notes in the relevant regions, search for: ### 1 ### and ### 2 ###

I finally tracked down the bug to a transversal optimization here: https://github.com/openzfs/zfs/blob/9bb3d57b03e6916a2d38574420a2934b8827b3fb/lib/libzfs/libzfs_sendrecv.c#L2703 in commit: https://github.com/openzfs/zfs/commit/47dfff3b86c67c6ae184c2b7166eaa529590c2d2

To workaround the problem, I added an override (FQ_OVERRIDE_GTND) that tested for: gtnd->skip == "containers" and allowed the transversal to proceed. https://github.com/TerraTech/zfs/blob/ea08910e9d687aed983bdb17ed90a61aab79abbf/lib/libzfs/libzfs_sendrecv.c#L2667

When doing the migration with the override in effect, I was able to complete the task successfully. The logfile for successful migration can be found at: https://gist.github.com/TerraTech/a942a3c2654f7645e85ccf4740f5113c

Describe how to reproduce the problem

1) Retrieve by zfs debugging branch $ git clone -b debug_send_receive https://github.com/TerraTech/zfs 2) build and use wrapper scripts in ./bin/zfs 3) Create the zfs test tree that will trigger the bug. $ ./zcreate_testtree _ ** using: 'zssd' will create zssd/ztest zssd/ztest/lxd ... zssd/ztest/lxd/snapshots 4) Perform the zfs send receive, and manipulate the FQ_OVERRIDE_GTND envariable

This will fail. $ ./bin/zfs send -R -L -c -e -p @send2zscratch | FQ_OVERRIDEGTND=0 ./bin/zfs receive -v -u -ocanmount=noauto _/zssd

This will succeed. $ ./bin/zfs send -R -L -c -e -p @send2zscratch | FQ_OVERRIDEGTND=1 ./bin/zfs receive -v -u -ocanmount=noauto _/zssd

In conclusion, I am not sure what the best approach is to fix the transversal optimization, other than possibly using the full path instead of the last path component.

Include any warning/errors/backtraces from the system logs

N/A - operational logs are above via gist links.

TerraTech commented 4 years ago

Can this please be tagged as a defect?

melato commented 4 years ago

I can reproduce this reliably with LXD. See https://github.com/lxc/lxd/issues/7854

melato commented 4 years ago

I entered a new bug with the LXD steps: https://github.com/openzfs/zfs/issues/10935

johanehnberg commented 3 years ago

I am also seeing this using LXD. The interesting thing is that using lxc copy works, and should be over ZFS, So I am bisecting which flags may affect this. It may be that it only succeeds by falling back on rsync. I will report back.

johanehnberg commented 3 years ago

LXD does this by:

zfs send -c -L | zfs recv -x mountpoint -F -u # first snap
zfs send -c -L -i | zfs recv -x mountpoint -F -u # for each next snap

It is the -R flag that causes the issue. The manual states "clones are preserved" so presumably this includes parent clone relations.

I imagine the use case for -R often is backup or migration, for which -R is clearly overkill. A backup scenario typically needs only the -I equivalent for incremental send but without the source snapshot, i.e. "send all snapshots up until this one".

As it stands, doing it the LXD way without -R and doing at least one -I pass afterwards is a doable workaround but not as elegant or convenient.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

TerraTech commented 2 years ago

not stale and should not be closed

toghrulgasimov commented 2 years ago

important issue

zfsbot commented 2 years ago

send -R is too big. it's basically an anti-pattern, it does too much. you'll find that most scripts who manage replication will loop over filesystems and run recv -o origin=foo/bar instead of allowing -R to turn your system into a bit-blender by wrongly linking origin to a similar filesystem in the wrong path etc.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

TerraTech commented 1 year ago

not stale and should not be closed

dch commented 7 months ago

I'm hitting this too, attempting to migrate a 2TiB zpool from FreeBSD 15.0-CURRENT af0d437 to 14.0-RELEASE.

The above workaround isn't suitable in this case, as there are many jails and doing this "manually" will not fit on the target zpool, too many cloned jails present.

Is it possible to pre-seed the origins somehow, such that the recv still works, and then clean up afterwards?

openzfs / zfs