markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

Duperemove is calling ioctl on the two exact same extent location #331

Open RustyNova016 opened 7 months ago

RustyNova016 commented 7 months ago

I noticed a huge performance hit while trying to dedupe BTRFS snapshots. When Duperemove tries to dedupe 80 copies of a Debian ISO file, it takes a very, very long time, even if it's already deduped.

So I made some tests. I took two ref linked 3GB files, saved their extends, then used Duperemove (no hashfile) on the two files with debug enabled. During the dedupe phase, multiple ioctl requests have been made. I then compared the new extends, and they are the exact same as before.

So Duperemove is trying to call ioctl on the same extents, being wasteful, and slowing down deduping large snapshot folders.

JackSlateur commented 7 months ago

Hello @RustyNova016 You are correct

Sadly, I am working on other things so I will not be able to work on this in the near future