Closed kapitainsky closed 1 year ago
On macOS extended attributes are used extensively and this fclones behaviour can create real mess. It should be IMHO either fixed or clearly documented so there are no surprises when used.
Now my two pennies' worth opinion about solution.
I think you could "copy" what jdupes does (it is also released under MIT licence) but in addition fix one remiaining issue jdupes still has https://github.com/jbruchon/jdupes/issues/189
This would make fclones one step closer to be ultimate deduplication program for macOS:)
GNU cp unfortunately does not work anymore:
Seems that brew version was built without xattr support:
# gcp --preserve=xattr --attributes-only sourceFile destFile
gcp: cannot preserve extended attributes, cp is built without xattr support
I did a bit poking around and seems that only way to do this properly is to use Apple Standard C Library copyfile copyfile(..., COPYFILE_METADATA)
I am not familiar wtih Rust but can see that std::fs::copy uses the native platform's implementation:
https://github.com/rust-lang/rust/pull/58901
Maybe it can help with fixing this issue if Rust implementation allows COPYFILE_METADATA.
I can use specific C APIs from Apple SDK in Rust.
FYI - this weekend I have also tested ref-link dedup behaviour on linux (BTRFS, Suse Linux) and here all is OK - xattr are preserved. So it is macOS specific problem.
FYI - this weekend I have also tested ref-link dedup behaviour on linux (BTRFS, Suse Linux) and here all is OK - xattr are preserved. So it is macOS specific problem.
This is because on linux fclones performs reflink-in-place. On other systems, it removes the file and recreates it by doing a copy with reflink.
I added a patch, but I can't test it because I have no access to macOS at the moment. Does it work for you?
I will test - no problem. Will report results today,
if I build with cargo install fclones
will it pull the latest commits?
if I build with
cargo install fclones
will it pull the latest commits?
I have tried and the answer is nope...
I have never used rust toolchain - could you please tell me how to build local git cloned version?
ok I think I managed crash learn how to do it:)
Here you are:
# fclones
fclones 0.27.2
# find ~ > hello.txt
# cp hello.txt hello1.txt
# cp hello1.txt hello2.txt
# ls
hello.txt hello1.txt hello2.txt
# shasum *
a39f070bbe8a4b867b7cc472a5288aba9c9302fb hello.txt
a39f070bbe8a4b867b7cc472a5288aba9c9302fb hello1.txt
a39f070bbe8a4b867b7cc472a5288aba9c9302fb hello2.txt
# xattr -w test "" hello.txt
# xattr -w test2 "" hello1.txt
# xattr -l *
hello.txt: test:
hello1.txt: test2:
# fclones group . | fclones dedupe
[2022-09-04 07:18:12.407] fclones: info: Started grouping
[2022-09-04 07:18:12.421] fclones: info: Scanned 4 file entries
[2022-09-04 07:18:12.421] fclones: info: Found 3 (23.7 MB) files matching selection criteria
[2022-09-04 07:18:12.423] fclones: info: Found 2 (15.8 MB) candidates after grouping by size
[2022-09-04 07:18:12.424] fclones: info: Found 2 (15.8 MB) candidates after grouping by paths
[2022-09-04 07:18:12.429] fclones: info: Found 2 (15.8 MB) candidates after grouping by prefix
[2022-09-04 07:18:12.431] fclones: info: Found 2 (15.8 MB) candidates after grouping by suffix
[2022-09-04 07:18:12.604] fclones: info: Found 2 (15.8 MB) redundant files
[2022-09-04 07:18:12.627] fclones: info: Started deduplicating
[2022-09-04 07:18:12.647] fclones: info: Processed 2 files and reclaimed up to 15.8 MB space
and xattr results:
# xattr -l *
hello.txt: test:
hello1.txt: test:
hello1.txt: test2:
hello2.txt: test:
different than fclones v0.27.1:
# xattr -l *
hello.txt: test:
hello1.txt: test:
hello2.txt: test:
but not yet what it should be:
# xattr -l *
hello.txt: test:
hello1.txt: test2:
Thank you for the test! I see, the problem is that it copied also the source file attributes. I need to clear them then before setting the originals.
Try now. Now I clear the attributes before setting the originals.
now have to do my Sunday chores but will do later today
Sure, no hurry! :) Thank you for all the help.
tested sucesfully. I concider reported issue fixed in v0.27.2
Thank you for fantastic tool and responsivness in dealing with bugs.
Happy to hear that! I added one more fix to restore the owner and group information as well.
Now when I think more about it, I wonder, maybe it would be actually better to make it a hard failure when metadata cannot be restored and it should attempt to rollback the change (by moving the original file back to its place)? Currently it is only a warning which means - if some metadata are not restored, they are lost forever. WDYT?
Happy to hear that! I added one more fix to restore the owner and group information as well.
Now when I think more about it, I wonder, maybe it would be actually better to make it a hard failure when metadata cannot be restored and it should attempt to rollback the change (by moving the original file back to its place)? Currently it is only a warning which means - if some metadata are not restored, they are lost forever. WDYT?
Definitely hard error by default – but maybe with optional flag to overwrite it?
I added the last commit to make it a hard error then. I also made error reporting a lot more detailed, so if it ever fails for someone, we'd know the reason. If all is ok, I'll merge and release - and btw feel free to review / test the changes once again. Thank you once again for collaborating on this! I couldn't fix that without your help!
ok - testing
aaah, the CI test failed.... investigating. Looks like I broke sth...
figured that out, CI green
I have tested xattr branch with latest commit (2c669c2) and all works in my tests - I included various xattr situations, files' ownership and groups and in addition in place encrytption files - the latest to make sure that it does not have issues similar to other deduplicators (namely jdupes - where it can lead to data loss - I mentioned it earlier https://github.com/pkolaczk/fclones/issues/152#issuecomment-1233175064). All OK
Also as I have seen in other projects problems with xattr copy when attribute size is bigger than 128KB e.g.: https://apple.stackexchange.com/questions/226485/copy-extended-attributes-to-new-file-ffmpeg I tested fclones with multiple MB size attributes. All OK.
Looks for me that rust is very robust thing... maybe I will look into it closer:)
If you would need help with future testing on macOS give me a shout.
Thank you so much!
when deduplicating (ref-link) fclones does not take care of extended attributes - it simply uses ones from the first file.
Example:
I created three identical files,
added sample extended attributes to two of them:
and now deduplicated using fclones:
as expected fclones found 2 duplicates and deduped them, however messing up external attr:
I guess you just
cp -c sourceFile destinationFile
creating clone of source file which unfortunately is not enough. Metadata should not be changed.External attributes should be preserved the same way like names - which are just basic attributes:).
The right way I would do it manually would be:
it unfortunately requires GNU cp (on macOS can be installed via brew
brew install coreutils
). I am sure there are maybe smarter ways to achieve the same:) What matters here is result.I also tried another deduplicator (jdupe) using the same dataset. Result:
So this is definitely doable.