Snapshot create problem when using both recursive and individual child settings

oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.

www.znapzend.org

GNU General Public License v3.0

603 stars 136 forks source link

Snapshot create problem when using both recursive and individual child settings #560

Closed Dacesilian closed 2 years ago

Dacesilian commented 2 years ago

Let's say we have multiple datasets under nvme and specify recursive settings on nvme and also settings on some specific datasets. Znapzend settings then looks like:

[2021-09-19 15:23:46.56566] [17078] [info] found a valid backup plan for nvme...
[2021-09-19 15:23:46.56592] [17078] [info] found a valid backup plan for nvme/container/subvol-144-disk-0...
[2021-09-19 15:23:46.56623] [17078] [info] found a valid backup plan for nvme/container/subvol-161-disk-0...
[2021-09-19 15:23:46.56648] [17078] [info] found a valid backup plan for nvme/container/subvol-162-disk-0...

Problem occurs when creating snapshots - ZFS "deadlock", sometimes says I/O blocked, always says snapshot is already exists; but then function for checking snapshot returns that it does not exist.

I think in this situation, there has to be only one thread for each ZFS pool and create snapshot one by one - recursive and then individual. It returns "wrong" exit code, but checking for snapshot existence should return true and all is fine.

Thanks.

matveevandrey commented 2 years ago

Same problem. It would be nice if recursive handling will skip datasets with individual settings

oetiker commented 2 years ago

there is a 'logic' problem recursive will cause the snapshot to happen at the top level, which is sort of the reason we added this whole recursive function, so that we can guarantee to have snapshots over a whole hierarchy of filesets which are consistent.

having alternate settings at the individual fileset level would have to be carefully planned, to make sure that it does not interfear with the toplevel snapshotting ...

as always, PRs are welcome!

matveevandrey commented 2 years ago

ole recursive func there is a 'logic' problem recursive will cause the snapshot to happen at the top level, which is sort of the reason we added this whole recursive function, so that we can guarantee to have snapshots over a whole hierarchy of filesets which are consistent.

having alternate settings at the individual fileset level would have to be carefully planned, to make sure that it does not interfear with the toplevel snapshotting ...

as always, PRs are welcome!

Thanks for clarification. How about an option to continue processing the recursive job on any error? It could be some kind of workaround

oetiker commented 2 years ago

that does not sound very reliable ... ignoring errors is not usually a good strategy.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Dacesilian commented 2 years ago

Closing for inactivity doesn't make sense. @oetiker Please, can you focus on this? Or you don't have any time to maintain this project? If it would be written in Java, I could help, but now I'm not able to do so.

oetiker commented 2 years ago

Closing for inactivity doesn't make sense. @oetiker Please, can you focus on this? Or you don't have any time to maintain this project? If it would be written in Java, I could help, but now I'm not able to do so.

Hi @Dacesilian,

this is an open source project with no financial backing ... what time I invest is either in response to a request from a paying customer or because I am intrigued by the problem and spend my spare time on solving it.

I focus mainly on fixing bugs (not missing features) and interacting with users ...

Dacesilian commented 2 years ago

@oetiker I understand that. That's the problem of many open source projects. It's sad that's written in language that I don't already know. Thanks.

oetiker commented 2 years ago

@Dacesilian perl is well worth looking into :) it has curly brackets ... so the jump from java is not that big :)

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

griznog commented 1 month ago

One way to work around this is to let znapzend create the snapshots recursively, then use a different tool for the actual send/recv of the child datasets. Say I have a pool like

pool
├── one
├── three
└── two
    ├── A
    └── B

and I want to snapshot the entire thing consistently but only send A and B to a remote host. I'd set up znapzend on pool to hande the snapshotting then have it call as a post-snap-command a script which then uses syncoid to do the actual replication. Side bonus of this approach is that the script using syncoid can implement some parallelism across hosts and filesytems pretty easily, speeding up the entire replication process.

You could also achieve the desired effect with just sanoid/syncoid, but you lose the nice consistent recursive snapshotting of znapzend and the elegance of having the snap config defined right in the filesystem.