Closed th1000s closed 3 years ago
Thank you for contributing to fclones! Generally the PR looks good.
Reflink semantics is different than link / soft link so it definitely deserves a separate command. Let's call it dedupe
. The word "link" adds IMHO confusion here, because at the logical level the files are not linked - you can modify each of them without affecting another.
Added a dedupe
command and now only use the reflink
crate for macOS (and, theoretically, some Windows server operating systems, but that is super untested. Maybe disable it explicitly or require an --eat-my-data
flag).
Since the direct Linux ioctl
can overwrite existing files it is easy to keep all the metadata in place to make the deduplication truly transparent.
I saw the filetime
crate was already included and used it to restore some metadata when the reflink
crate is used, but see the TODO (and others) in the code. On macOS Sierra (10.12 from 2016) the stdlib fs::copy
already uses clonefile so the reflink dependency could be dropped if windows support is not that important.
Added a dedupe command and now only use the reflink crate for macOS (and, theoretically, some Windows server operating systems, but that is super untested. Maybe disable it explicitly or require an --eat-my-data flag).
Not sure if I understood, but reflink create is used on all systems except Linux Android, not except all Linux flavors.
I can see there is target_os = "android"
qualifier.
Honestly if the reflink crate doesn't support some usecase, try to:
reflink
, this way other projects would benefit from your work as welldedupe.rs
at the same level of abstraction - don't mix different abstraction levels.There are some clippy warnings failing the CI, but I understand this is a draft. BTW: Which filesystems on Linux I can test it with?
As for Windows support, if deduplication functionality is buggy and could cause data loss, better disable it by default, and enable only under --enable-experimental-options
(need to add such flag). But I'm also fine just disabling it for now.
On Linux (and Android, which Rust treats as a different Unix) the ioctl is called directly, this works without removing the destination, so it is easy to keep all metadata. Only macOS (and Windows) currently use the reflink
crate. However when testing on Linux it is also used to test the the logic which manually restores the metadata.
Regarding contributing to reflink
, I currently have a PR open for something else, and at the moment the crate doesn't see that many commits. I'll place all the deduplication and metadata-keeping code in a separate module for now and see about upstreaming the changes later.
On Linux btrfs and XFS support ioctl_ficlone
, an mkfs
should have this enabled without extra flags.
The reflink functions are now in a separate file, and because in the case of deduplication is is known that the file are of equal size the destination is no longer truncated.
Thank you, this PR is still marked as a draft. Is there any work remaining? I'd like to do a final review, merge it and release it. :)
I added a warning regarding the --priority
and --rf-over
options to the dedupe --help
output. The windows code path (if someone builds it there) should still be hidden behind an --experimental
flag and somewhere the metadata loss there and on macOS needs to be mentioned or enabled via --lossy
or so.
(Build fails? Works here on 1.48 and 1.55, but I can't see details, maybe try github actions? I think binaries can be built there as well, once you find the right YAML incantation.)
Code formatting check failed. The build and tests passed. I'm fixing circleci definition so the next time it would be more clear.
[2021-10-20 10:51:36.517] fclones: info: Started deduplicating
[2021-10-20 10:51:36.618] fclones: warn: Failed to reflink /home/pkolaczk/Projekty/fclones/test/foo.3 -> /home/pkolaczk/Projekty/fclones/test/foo.1: Failed to deduplicate /home/pkolaczk/Projekty/fclones/test/foo.3: Operation not supported (os error 95)
[2021-10-20 10:51:36.618] fclones: warn: Failed to reflink /home/pkolaczk/Projekty/fclones/test/foo.2 -> /home/pkolaczk/Projekty/fclones/test/foo.1: Failed to deduplicate /home/pkolaczk/Projekty/fclones/test/foo.2: Operation not supported (os error 95)
Nit: These warn messages are a bit too verbose. Can you make them less redundant? Let's not report the same path twice.
This should be enough:
warn: Failed to reflink /home/pkolaczk/Projekty/fclones/test/foo.3 -> /home/pkolaczk/Projekty/fclones/test/foo.1: Operation not supported (os error 95)```
Merged in #79
This PR was still marked as a draft on purpose, but now it is done, see #80
This adds a
--reflink
option tofclones link
(and for now does not update the help text). Maybe it should get its own subcommand because it sits between linking and copying files. And I think it definitely should not fall back to cross-device symlinks.If the filesystem does not support reflinks it does not abort the entire link process but tries again, maybe the first failure should be more than a warning when reflinking, or the fs is tested first?
Because previously created reflinks are undetectable the
space_to_reclaim()
output is unreliable, maybe add an "up to" qualifier in that case?To verify (manually) on btrfs run
btrfs filesystem du testdir/
and use files at least a few kilobytes long.