markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
805 stars 80 forks source link

Support non-extent mapped block based/whole file dedupe #312

Open dberlin opened 11 months ago

dberlin commented 11 months ago

I've implemented FIDEDUPERANGE for ZFS recently (PR is not accepted yet, but ongoing), however, FIEMAP support is not likely to happen anytime soon.

Right now, all forms of checksumming require the extent map, even whole file dedupe (right now to test zfs dedupe i pipe in the results of fdupes). They simply do different ranges of extends (all of them, some of them, you name it). It would be nice to support simple block based and whole-file dedupe that did not require extent mapping ioctl to work.

The deduperange ioctl works on offset/sizes anyway.

I'm happy to implement this if folks are willing to consider the patches.

I could simply make a fake extent map, or i could make it use different non-extent based code paths when fiemap fails.

JackSlateur commented 11 months ago

Hello @dberlin This sounds like an interesting usecase Thank you for your work on openzfs

I will take the time to review your code, if you want this feature merged as soon as possible! If not, I will work on that later

dberlin commented 11 months ago

Great. I'll send a PR that makes a fake extent map for now, and you can take a look and tell me whether you want me to take a different approach or not :)