oetiker / znapzend

zfs backup with remote capabilities and mbuffer integration.
www.znapzend.org
GNU General Public License v3.0
607 stars 137 forks source link

Parallel send/recv support #490

Closed crabique closed 3 years ago

crabique commented 4 years ago

Hi!

We are using znapzend to backup a dataset with a high number of datasets inside it (>10k), making and sending snapshots daily. We are also using mbuffer port option.

Unfortunately, as the number of datasets grew we noticed huge running times for the backup tasks, about 5-6 times slower compared to our other server where we have the same amount of data but it's all one big dataset without recursion.

Upon closer inspection, we found that znapzend spawns a new ssh process for every child dataset and it was slow. To speed it up a little, we added ssh multiplexing options to .ssh/config so that it at least re-uses the SSH connection.

This was not enough and it was still going very slow, because it still takes some time to spawn an instance of znapzend that spawns an ssh process that spawns an mbuffer | zfs recv on the recv end, then actually transfer even an empty snapshot (on average ~3.5 seconds).

So the feature request is as follows: since recursive send is more "atomic", it could be possible to send datasets in parallel on different mubffer ports. For example, the configuration could look like that:

--mbuffer=/usr/bin/mbuffer:30001,30002,30003,30004

Which would mean there should be 4 znapzend workers each sending and receiving snapshots on those ports in parallel.

Please let me know what you think about this.

oetiker commented 4 years ago

making znapzend faster is always a good thing, how well this works is beast to be judged based on actual implementation → PR welcome!

jimklimov commented 3 years ago

Seems this use-case could also benefit from a solution to #438 that would reduce the amount of spawned processes. Also using several backup schedules for smaller subtrees allows to parallelize already.

griznog commented 3 years ago

For a straightforward parent/child layout where the children have no local znapzend config, seems like #438 is the better approach. For all other use cases rather than specifying this in a filesystem setup it'd be nice to be able to tell znapzend daemon:

--parallel=N
--mbuffer=/usr/bin/mbuffer:FIRSTPORT

where --parallel=N sets up to N parallel send/recvs and --mbuffer=...:FIRSTPORT would have mbuffer us the port range FIRSTPORT to FIRSTPORT+N. If a filesystem specifies --mbuffer= then it gets handled serially, all others get handled with up to N in parallel. A filesystem could have --mbuffer=NONE or --mbuffer= with value empty to disable the daemon level mbuffer config and fall back to only ssh.

marshalleq commented 3 years ago

I seem to have the opposite problem, whereby I have a lot of replicated datasets both within a system and to a remote system. When these run, they cause performance issues on the system and I'd like to be able to limit them or schedule them to not run at the same time. I only write this here, because it may be that by implementing a number of parallel streams, we could use the same code to restrict the number of parallel streams.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.