openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

Support for writable snapshots or similar functionality to create application consistent snapshots #16733

Open Daniel-Nashed opened 1 week ago

Daniel-Nashed commented 1 week ago

ZFS snapshots are working great for us. But some applications require application consistent snapshots.

On Windows VSS supports Auto Recovery snapshots which allows a backup to mount the snapshot to be mounted during backup operations before the backup completes.

Having a writable snapshot would allow to build similar integrations on Linux. btrfs has support for writable snapshots, but we really would like to have customers use ZFS, because of many other ZFS features.

Maybe I am thinking into the wrong direction and there is a better way. I tried to come up with a flow where we take a snapshot, mount it as a new file-system and apply changes. But the snapshot we are taking from that new clone still has dependencies to the previous snapshot which cannot be removed.

The problem we are trying to solve:

A way to have writable snapshot or a similar way that could allow us to bring an application into a consistent state, take a snapshot and then merge the deltas occurred during the freeze time would help us to use ZFS in more environments.

Thanks for any feedback and ideas how we could get this solved with today's features and taking into account my feature request for future releases in some way.

-- Daniel

IvanVolosyuk commented 1 week ago

You can do 'zfs promote' the clone and delete the original dataset at some point if it helps. This way you can get rid of the previous fork and create a new fork (snapshot+clone) later when needed.

Daniel-Nashed commented 1 week ago

Thanks @IvanVolosyuk No sadly this does not help. the business case is that we want to take a snapshot of an application. the application continues to run. and we want to keep the snapshot with the application changes which occurred during the application "freeze".

I looked into all type of flows and did not find any flow that allows me to keep the original volume and just a single snapshot with the backup.

It looks like there is no way to remove the intermediate snapshot. So in theory we could keep all the snapshots and just have one additional snapshot. Promoting does not help. Only a clone and bringing it online and merging the changes into it. And then take another snapshot of the new clone.

But then the intermediate snapshot cannot be deleted. I really looked at all the different ways to clone and promote. The promote isn't helpful in this case from what I see.

amotin commented 1 week ago

Assuming that you really thought it through and you must have a "writable snapshot" I see only one way. From the traditional ZFS clones perspective there is indeed no way to disconnect a clone from its origin without deleting one or another. Just because it would require some mechanism for tracking overlapped space, which was just not there. Though some time ago, after block cloning was implemented, somebody proposed an idea that we could create a fake records in BRT table for each block shared by the clone and its origin and just forget the clone's relationship with the snapshot, effectively making it an independent dataset, sharing space only via block cloning, which you can do anything with. It would be quite a brain surgery, but may be possible. At this point we assert that block cloning and deduplication are applicable only to data blocks, but may we could change that. But the clones freed that way would be much more expensive than traditional ones, so it would probably make sense only once clone and its origin diverged far enough, unless you already use deduplicaiton on the dataset, in which case it would cost only DDT records for metadata.

Daniel-Nashed commented 1 week ago

@amotin thanks for your feedback! it really sounds like a bigger undertaking and nothing that we could get sometime soon. And it also adds a lot of overhead "just" for an application aware snapshot. We can take a clone for at least getting consistent data and backing it up in another way.

But it really sounds like when we want to use it with the existing functionality we need to create two snapshots and keep both. One would be just an intermediate snapshot. It sounds like there would not be big harm to have two snapshots. they would not take additional space. We would just need to name them right to not get confused and also send them over when we send snapshots to a remote location.

The answer that this isn't something we get soon and it might add overhead is a helpful answer to see what else we can use. My use case is a specific use case and might not be of general interest.