pauldreik / rdfind

find duplicate files utility
Other
980 stars 79 forks source link

Added -rememberidentinode option which better handles hardlink groups #68

Open bertbaron opened 3 years ago

bertbaron commented 3 years ago

I added the option -rememberidentinode which better handles hardlink groups. I tried to keep the changes to the code at a minimum level and impact on performance of existing behavior neglectable. With the new option enabled hardlink groups are more or less handled as with -removeidentinode false while the performance is close to that of the default behavior.

This can make a big difference. These are some results from a test using the different options on three snapshots of some data taken with rsnapshot:

with options -makehardlinks true -removeidentinode true
RUN 1, 58s,  reported saving 4G, actual saving 0G
RUN 2, 57s,  reported saving 4G, actual saving 0G
RUN 3, 58s,  reported saving 4G, actual saving 4G
RUN 4, 1.5s, reported saving 0G, actual saving 0G

with options -makehardlinks true -removeidentinode false
RUN 1, 5m9s,  reported saving 57G, actual saving 4G
RUN 2, 5m53s, reported saving 57G, actual saving 0G

with options -makehardlinks true -rememberidentinode true
RUN 1, 55s,  reported saving 4G, actual saving 4G
RUN 2, 1.5s, reported savin 0G, actual saving 0G

However, I had almost zero experience with C++ so I'm sure the code can be improved. Please let me know what you think.

pauldreik commented 3 years ago

Sorry for the late reply!

Bonus points for providing a test script in the PR, well done!

I will have to think a bit about this.