Closed art-w closed 2 years ago
Sounds like a good idea to me, but I'm not familiar with this backend.
Thanks for this @art-w ! With @talex5's help I wrote the rsync backend (mainly for it's convenience and also for the macOS port of obuilder).
I'm working under the assumption that files are never modified in place in the store but always copied elsewhere first.
Yep, that is a correct assumption (or at least should be!). Anything in result/<hash>
should be immutable. The rsync backend is based off of the btrfs backend and typically will:
result/1234
, to a new home in result-tmp
, say result/5678
.runc
, will then use the result-tmp
during the build step.result-tmp
to result
, which if iiuc is where you have changed this to another copy to add the hardlinks and a delete of the result-tmp
?So, from a Linux perspective, I think this change is good. It does come at the added cost of doing a copy rather than a rename, but I think the potential disk space saving is worth it. The rsync backend isn't supposed to be fast. I'll follow up again soon after I rebase and try this with the experimental macOS port (see #87).
Thanks, your explanations does match my intuition! Yes the original mv result-tmp/xyz result/xyz
becomes a two-step "cp" result-tmp/xyz result/xyz ; rm result-tmp/xyz
and yes, the copy is much slower than a rename :/
(Out of curiosity, I'm going to run some tests without --checksum
to see how expensive it is, but I don't think it's 100% safe to skip it in obuilder
use-case.)
Thanks for testing it out on another platform! I like the idea of letting the user choose the tradeoff so I added the corresponding CLI flag: by default it keeps the original copy
behavior, but it's safe to switch to hardlink
(back and forth with copy
, the store shouldn't care).
On your example, I get 7.9G in 3m25s for copy
, 3.1G in 7m35s for hardlink
... and 3.1G in 4m20s for hardlink_unsafe
(no checksum, which is the mode I would like to use when developing). Let me know if you would rather not have this third dangerous option!
Please note that I have NO idea what I'm doing: I'm working under the assumption that files are never modified in place in the store but always copied elsewhere first. If that ain't the case, well, please ignore and close this PR harshly!
I was hoping to save a bit of disk space in the obuilder store when using rsync:
Here
ce0813
was created from343861
by runningsudo ln -f /usr/bin/opam-2.0 /usr/bin/opam
. As a full build involves a dozen steps, the copy-everything is eating my disk alive... But by asking nicely, rsync could observe that files fromce0813
are identical to those in343861
and create hard links to the originals rather than a real copy. This is obviously wrong if either can be updated in place later!Regarding the rsync arguments:
--link-dest=
is the path in which the original files will be discovered (and hard-linked to). When this argument is a relative path, it is interpreted as relative to thedst
directory (which would be plain wrong here!)... Hopefully the paths are always absolutes, hence the cmdliner quick fix to enforce it. I tried relative paths for the rsync store to see if I was breaking existing functionality, but it was already crashingrunc
because it led to the wrong store path. => I'm not sure if thebtrfs
backend has the same limitation? (the doc seems to imply so)--checksum
may not be 100% required, but otherwise rsync might hard-link files even though they could be different as it only checks the filename, file size, modifications dates, ... and not the actual content.Anyway, the result makes me sad. Hard-links are correctly created for files, but not directories: (because "stuff tends to break when your fs is not a tree")
"Everything is a file, but some files are more files than others."