I added the option -rememberidentinode which better handles hardlink groups. I tried to keep the changes to the code at a minimum level and impact on performance of existing behavior neglectable. With the new option enabled hardlink groups are more or less handled as with -removeidentinode false while the performance is close to that of the default behavior.
This can make a big difference. These are some results from a test using the different options on three snapshots of some data taken with rsnapshot:
with options -makehardlinks true -removeidentinode true
RUN 1, 58s, reported saving 4G, actual saving 0G
RUN 2, 57s, reported saving 4G, actual saving 0G
RUN 3, 58s, reported saving 4G, actual saving 4G
RUN 4, 1.5s, reported saving 0G, actual saving 0G
with options -makehardlinks true -removeidentinode false
RUN 1, 5m9s, reported saving 57G, actual saving 4G
RUN 2, 5m53s, reported saving 57G, actual saving 0G
with options -makehardlinks true -rememberidentinode true
RUN 1, 55s, reported saving 4G, actual saving 4G
RUN 2, 1.5s, reported savin 0G, actual saving 0G
However, I had almost zero experience with C++ so I'm sure the code can be improved. Please let me know what you think.
I added the option -rememberidentinode which better handles hardlink groups. I tried to keep the changes to the code at a minimum level and impact on performance of existing behavior neglectable. With the new option enabled hardlink groups are more or less handled as with -removeidentinode false while the performance is close to that of the default behavior.
This can make a big difference. These are some results from a test using the different options on three snapshots of some data taken with rsnapshot:
However, I had almost zero experience with C++ so I'm sure the code can be improved. Please let me know what you think.