markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

duperemove doesn't remove extents, but does when fdupes is involved #276

Closed mbo77 closed 2 years ago

mbo77 commented 2 years ago

I am collecting some experience with duperemove on btrfs and xfs. Right now I'm on btrfs. My test case is setup like this:

[user@server nextcloud]# ll data/user/files/folder/largefile.mp4 -rw-r--r-- 1 apache apache 1,696,283,207 Feb 6 2021 'data/user/files/folder/largefile.mp4' [user@server nextcloud]# ll users/user/folder/largefile.mp4 -rw-rw-r-- 1 user users 1,696,283,207 Feb 6 2021 'users/user/folder/largefile.mp4'

If I run duperemove -rh /daten/nextcloud it won't dedupe the extents.

But fdupes -r /daten/nextcloud|duperemove --fdupes will do.

I will restrict my test runs later and will add the actual output here.

lorddoskias commented 2 years ago

This is likely due to the fact that even though the actual contents are the same, the extent layout is different. The way to work around this is to use the block dedupe (as opposed to extent dedupe option). This is explained in the FAQ in the man page: https://github.com/markfasheh/duperemove/blob/master/duperemove.8#L338

You can use block-based dedupe by using --lookup-extents=no option, or running duperemove with --write-hashes-v2

mbo77 commented 2 years ago

Thank you for the quick reply, this sounds reasonable. I will give it a try and come back to you.

Update: This solved the issue und works pretty well, including an external hash file. Thanks for the support.