markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
829 stars 82 forks source link

Duperemove deduplicates parts of the hashfile #335

Closed hhyyrylainen closed 1 month ago

hhyyrylainen commented 11 months ago

I just noticed that after deduplicating other stuff duperemove is now deduplicating parts of the hash file:

[0x561aa1f9b120] (0073/1784) Try to dedupe extents with id 0983e7b3
[0x561aa1f9b120] Dedupe 1 extents (id: 0983e7b3) with target: (747.7MB, 4.0KB), "/mnt/bulk_data/dedup.hash"
[0x561aa1f9cac0] (0074/1784) Try to dedupe extents with id 09ad3804
[0x561aa1f9cac0] Dedupe 1 extents (id: 09ad3804) with target: (276.8MB, 4.0KB), "/mnt/bulk_data/dedup.hash"
[0x561a4734b9b0] (0075/1784) Try to dedupe extents with id 09afc8f9
[0x561a4734b9b0] Dedupe 1 extents (id: 09afc8f9) with target: (556.6MB, 4.0KB), "/mnt/bulk_data/dedup.hash"
[0x561aa1fe4170] (0076/1784) Try to dedupe extents with id 09e61c95
[0x561aa1fe4170] Dedupe 1 extents (id: 09e61c95) with target: (61.6MB, 4.0KB), "/mnt/bulk_data/dedup.hash"
[0x561aa1f9b180] (0077/1784) Try to dedupe extents with id 09e9cd7d
[0x561aa1f9b180] Dedupe 1 extents (id: 09e9cd7d) with target: (664.3MB, 4.0KB), "/mnt/bulk_data/dedup.hash"

This is using duperemove 0.13. Is this intended behaviour? This seems a bit like a bug to me as I need to manually specify an --exclude to skip processing the hashfile.

Alex-K37 commented 2 months ago

It is a file like any other. Why should it be treated differently?

You have the option to a) exclude it (as you said), b) place it somewhere outside of your deduplication directory tree if you want to, e.g., save the deduplication time.

Just my two cents.