openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.56k stars 1.74k forks source link

Can't send incremental stream from clone to origin #9594

Open ElvishJerricco opened 4 years ago

ElvishJerricco commented 4 years ago

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 19.09 (Loris)
Linux Kernel 5.3.7
Architecture x86_64
ZFS Version 0.8.2-1
SPL Version 0.8.2-1

Describe the problem you're observing

After cloning a snapshot, modifying the clone, and creating a snapshot on the clone, an incremental send stream can be made from the origin to the new snapshot. But this stream cannot be received by the origin dataset.

Describe how to reproduce the problem

POOL=tank

zfs create $POOL/src
echo foo > /$POOL/src/foo && sync -f /$POOL/src/foo
zfs snapshot $POOL/src@origin
zfs clone $POOL/src@origin $POOL/clone
echo bar > /$POOL/clone/bar && sync -f /$POOL/clone/bar
zfs snapshot $POOL/clone@clone-snap
zfs send -i $POOL/src@origin $POOL/clone@clone-snap | zfs receive $POOL/src

Expected result:

The src dataset contains the clone-snap snapshot.

Actual result:

cannot receive new filesystem stream: destination 'tank/src' exists
must specify -F to overwrite it

Note this is the zfs receive process failing. The send stream is created without issue. Piping the zfs send command into /dev/null runs without error.


Interestingly, you can do a little dance with promote to make it work.

zfs promote $POOL/clone
zfs send -i $POOL/clone@origin $POOL/clone@clone-snap > clone.send
zfs promote $POOL/src
zfs receive $POOL/src < clone.send

Now the src dataset has the clone-snap snapshot, and is still the promoted dataset. This is a potentially useful way to make an atomic modification to a file system.

ahrens commented 4 years ago

Interesting feature request and a cool use case. As you probably figured out, the send stream is marked as being "for a clone", and therefore when it's received it creates a new filesystem. I agree it would probably make sense for what you're trying to do here to work.

All that said, for this use case it might make more sense to have a "replace origin with clone" operation, that way you don't have to copy all the data. With a few additions, you could almost get there with channel programs, by promoting the clone and then renaming things such that the clone now has the origin's name. But you couldn't do that while the filesystem was mounted (at least not without doing some more serious design work).

DeHackEd commented 4 years ago

I feel like there may be a generic feature request here where an incremental that would create a clone, instead simply doesn't. We already have -o origin=[...] to turn a simple incremental into a clone, why not the reverse where a clone creating stream is received as a simple incremental.

ElvishJerricco commented 4 years ago

@DeHackEd Yea that's the idea I was going for, but I do really like @ahrens "replace origin with clone" idea as well.

ElvishJerricco commented 4 years ago

Expanding on the "replace origin with clone" idea, it could be similar to Git "fast forward" merges. Something like:

zfs fast-forward tank/src tank/clone@foo

Would update tank/src to contain everything from tank/clone up to @foo, and would change the origin property of tank/clone to @foo.

ElvishJerricco commented 4 years ago

@kpande I think your steps are solving a very different problem. The most important aspect of what I'm trying to do is that the filesystem I'm trying to change doesn't get unmounted at any point; the whole FS is just updated instantly without applications having to close down at all.

ElvishJerricco commented 4 years ago

@kpande Could you elaborate on what they really do? I'm not really following it.

ElvishJerricco commented 4 years ago

@kpande Yea, I think that's a very different use case. If I understand correctly, your steps update a template without having to unmount any VM's clone of whatever template it came from. My goal is to have a live, mounted filesystem get updated atomically by making changes to a clone and merging those changes to the origin filesystem atomically with zfs receive or something. It's not the clone that I care about keeping mounted; it's the origin filesystem, which is not supposed to be a template.

For instance, if I wanted to write a large file to /tank/src/foo without ever having an incomplete version of the foo file visible in the tank/src dataset, I'd like to clone tank/src, write foo to /tank/clone/foo, then zfs receive the change into tank/src or fast-forward it to include foo, all without unmounting /tank/src.

ElvishJerricco commented 4 years ago

and you could always do what every other application or project that desire your functionality are doing, copy to a temporary file and then mv it in place.

It was a bad example. Instead consider wanting changes to many files to all appear simultaneously. There's no way to modify multiple files atomically in Linux; hence trying to find a solution with ZFS.

I think the issue with your code there is that you had to rename datasets. That requires remounting filesystems. If I'm wrong, could you explain how you would go about updating a filesystem mounted at /tank/foo/ to contain 5 new files atomically, such that any application can only see none or all of them in /tank/foo/ at any time, without remounting?

ElvishJerricco commented 4 years ago

@kpande I don't understand the details of the issue you linked well enough to know if it's a counterargument to this, but I did point out in the original description of this issue that you can do this with a little promote dance:

Interestingly, you can do a little dance with promote to make it work.

zfs promote $POOL/clone
zfs send -i $POOL/clone@origin $POOL/clone@clone-snap > clone.send
zfs promote $POOL/src
zfs receive $POOL/src < clone.send

Now the src dataset has the clone-snap snapshot, and is still the promoted dataset. This is a potentially useful way to make an atomic modification to a file system.

The receive command works without having to remount $POOL/src at all.

ElvishJerricco commented 4 years ago

no, that command remounts the filesystem,

I don't think it does. I tested this by holding open a file handle to one of the files in the filesystem while running those commands. ZFS did not complain about target is busy. Whereas if I use zfs rename with the file handle open, it does complain about target is busy, umount fails, and ZFS does not successfully rename.