`fclones` re-scans hard links

I have an backup drive that stores backups created using rsync, where each backup is a full copy of the directory tree but with each unmodified file hardlinked to the previous backup. This means that most files in the filesystem are hardlinks. (System is Linux/XFS.)

After grouping by paths and size, fclones seems to think that it has over 200TB of data to read. This takes a very long time and eventually runs out of memory (with 64gb RAM in the system).

The actual disk storage of the backups is only roughly 10TB. I assume what's happening is that fclones doesn't realize that hard links point to the same file data? In which case it's trying to scan the contents of each link to each file, rather than scanning each file and assuming (correctly) that each link to that file must have the same contents.

Does this assessment sound plausible? If so, is there a reason that fclones works this way, or would it be feasible to adopt this sort of optimization?

pkolaczk / fclones

`fclones` re-scans hard links #177