pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.87k stars 70 forks source link

Possibility to also remove hardlinks #124

Closed eliasp closed 2 years ago

eliasp commented 2 years ago

I have an old rsnapshot backup pool, which heavily uses hardlinking to deduplicate files between the various backup targets, e.g daily.[0-6], weekly.[0-3], etc.

Now I'd like to consolidate all those hardlinks down to a single file - meaning I'd like to delete all hardlinks of files and only keep one original file around.

I tried it like this:

# fclones group -o /root/backup-consolidation.txt /var/backup/{daily,weekly,monthly}.*
[2022-05-11 15:49:27.435] fclones:  info: Started grouping
[2022-05-11 15:55:34.079] fclones:  info: Scanned 4601517 file entries
[2022-05-11 15:55:34.120] fclones:  info: Found 4174323 (8.3 TB) files matching selection criteria
[2022-05-11 15:55:35.729] fclones:  info: Found 4091225 (7.5 TB) candidates after grouping by size
[2022-05-11 15:57:16.520] fclones:  info: Found 856263 (57.6 GB) candidates after grouping by paths and file identifiers                                                                                                              
[2022-05-11 16:25:09.154] fclones:  info: Found 209606 (49.5 GB) candidates after grouping by prefix
[2022-05-11 16:25:11.154] fclones:  info: Found 209606 (49.5 GB) candidates after grouping by suffix
[2022-05-11 16:59:20.983] fclones:  info: Found 209497 (49.4 GB) redundant files

# fclones remove --isolate '/var/backup' --keep-path '/var/backup/daily.0/**' < /root/backup-consolidation.txt
[2022-05-11 17:13:44.766] fclones:  info: Started deduplicating
[2022-05-11 17:13:50.278] fclones:  info: Processed 0 files and reclaimed 0 B space

But this approach showed, that fclones considers those hardlinks just to be deduplicated data, so it does nothing.

Using move to move just a single copy of each file outside of the isolated directory had the same effect:

# fclones move /var/backup.consolidated/ --isolate
 '/var/backup' --keep-path '/var/backup/daily.0/**' < /root/backup-consolidation.txt

[2022-05-11 23:54:36.192] fclones:  info: Started deduplicating
[2022-05-11 23:54:45.250] fclones:  info: Processed 0 files and reclaimed 0 B space

So I wonder whether removing redundant hardlinks from an isolated target should be actually supported by fclones and I'm just doing it wrong or whether this is a not (yet) supported use-case?

pkolaczk commented 2 years ago

Try -H | --hard-links flag.

pkolaczk commented 2 years ago

Btw - links handling is going to be improved in the next release that should happen very shortly, and you will no longer need -H flag on isolated targets, because linked files with different roots will be treated as if they were not linked.

Currently, without this flag, all hardlinked files except one are simply removed from analysis at a very early stage - that's why what you try to do doesn't really work.

eliasp commented 2 years ago

Thank you! Totally makes sense - in hindsight! I was looking for options like -H in move/remove - didn't realize it's something I have to take care of during group!