graphite-project / carbonate

Utilities for managing graphite clusters
MIT License
516 stars 80 forks source link

Use --copy-dest, enabling the rsync algorithm when copying from remote to staging #106

Closed luke-heberling closed 5 years ago

luke-heberling commented 5 years ago

A distinguishing feature of rsync as compared to scp/ftp/etc is the rsync algorithm (https://www.samba.org/~tridge/phd_thesis.pdf), which finds a near-optimal transfer block size for comparisons to avoid transmitting unchanged data within files which are partly changed.

In usage of rsync where the rsync destination is not the final destination(e.g. as used in carbonate w/ a staging dir), the default options do not allow it to find the base files for this comparison. rsync provides the --{link,copy,compare}-dest options for this purpose, but carbonate does not currently make use of them. This is significant to me because I'm using carbon-sync for replicating metrics between cloud providers and around the world, in some cases where bandwidth is constrained.

Relative to --copy-dest, --compare-dest and --link-dest would reduce some disk ops on the receiver when files match exactly, but I discounted those options for these reasons:

As expected, in testing carbon-sync with --copy-dest with mostly-matching whisper files I saw huge improvements in rsync time and in bandwidth usage (testing within one VPC <50% time and <5% bandwidth). Obviously this doesn't help when the files mostly do not match, and it potentially creates more I/O on the receiver because now rsync must read the whisper files in addition to carbon-sync. That seems like a great tradeoff to me, how does graphite-project feel about it?

deniszh commented 5 years ago

Hi @luke-heberling ,

I think that's a good idea, but ideally, we should make it configurable with some sane/safe defaults. Do you think it's possible?

luke-heberling commented 5 years ago

Hi @deniszh , that's definitely possible!

I feel like defaulting to use --copy-dest probably makes sense -- this is not a new addition to rsync (I know it was there at least 10 years ago, probably longer), and if there is a problem accessing the --copy-dest directory then rsync will emit a warning and fallback to the current behavior, as though --copy-dest had not been provided.

I updated the pull request to enable the behavior by default, and expose a toggle via --rsync-disable-copy-dest. That also entailed updating the copy of --help in the README. How does that look?

deniszh commented 5 years ago

Hi @luke-heberling ,

Thanks, that's looks better! Will try to fix tests and merge.

piotr1212 commented 5 years ago

I fixed the tests here: https://github.com/graphite-project/carbonate/pull/107/commits/edabef0ebd8ca4b8c3d5e54afc89866a9f8646af Newest flake8 pulls in some dependency which is python3 only.

deniszh commented 5 years ago

Yes, thanks, @piotr1212! Please rebase, if possible, @luke-heberling

luke-heberling commented 5 years ago

@deniszh rebased

deniszh commented 5 years ago

Cool, looks great! merging!