Closed clara-j closed 7 years ago
Did you read about "Deduplication phase" in the WIki? Duperemove 'just' submits candidates to the kernel. The kernel will then do a full comparison (so yes, the btrfs file system driver does this, as far as I understand) and possibly dedupe the candidates.
Thanks. No I had missed that page. That is good to hear since I think that will speed up the fdupes process. I will need to run some tests to verify but at least it won't impact data integrity.
I re-opened this issue to see if there is anything else people would like to see changed in fdupes that would make it work better for duperemove.
I have now pushed to my fork a version that has 3 new arguments -b limits files based on minimum size -B limits files based on maximum size -e skip final byte to byte verification
When I have some time I am going to see if there is anything else I can do to speed up the process but was wondering if there was other features people would like added that I can look into.
Having an option to make fdupes stay within a single filesystem (not crossing mounts) would be nice too.
An --excludes option would be excellent. Oh also, thanks for doing this!
Having it stay within a given device seems like it will be a fairly easy addition.
For the exclude setting what behavior are you looking to have. If you can point me to another process that uses a similar behavior.
I'm thinking like rsync, so you can do --exclude= and provide a comma separated list of paths to skip. If you want I can give you a specific example.
couldn't you just use grep for that in this case?
Sure but then fdupes is reading those files and comparing them, whereas with a --exclude you wouldn't even touch them.
The xdevice argument should be done by the weekend. I then plan to add the ability to set K, M G for the size limit arguments to make it easier to use.
The exclude will be more complicated and will take me some time to do, but ya I understand the way rsync does it and should be able to emulate that I think. I will probably have to re-factor the original code too a bit to get it to work.
PS. Mark, no thank you for all the great work on this and other BTRFS stuff.
pushed the code. There is now an x argument for cross devices limiting, and I also added the ability to add M,G, or K when specifying the size limits.
On 18.09.2015 18:00, clara-j wrote:
pushed the code. There is now an x argument for cross devices limiting, and I also added the ability to add M,G, or K when specifying the size limits.
Great! Thank you :)
I am working on a fork of fdupes to add some features to help with duperemove.
So far I have added the ability to limit the files fdupes considers based on a min or max size. One other thing I was going to add was the ability to disable the byte for byte comparison after a checksum match. This would obviously add the possibility for a false match on checksum collision (very rare since file size is checked too). But I was wondering if duperemove or btrfs itself will also validate the data being de-duped is a match.