Closed aseering closed 1 year ago
That doesnt seem right if you're using a recent version of fclones. Avoiding multiple scanning of hard links has been already fixed, so if it doesn't work, that would be a bug.
See #142
As for running out of memory - how many files are you processing? The paths and checksums are kept in memory so if there are millions of fikes, 64 MB won't be enough.
I have an backup drive that stores backups created using
rsync
, where each backup is a full copy of the directory tree but with each unmodified file hardlinked to the previous backup. This means that most files in the filesystem are hardlinks. (System is Linux/XFS.)After grouping by paths and size,
fclones
seems to think that it has over 200TB of data to read. This takes a very long time and eventually runs out of memory (with 64gb RAM in the system).The actual disk storage of the backups is only roughly 10TB. I assume what's happening is that
fclones
doesn't realize that hard links point to the same file data? In which case it's trying to scan the contents of each link to each file, rather than scanning each file and assuming (correctly) that each link to that file must have the same contents.Does this assessment sound plausible? If so, is there a reason that
fclones
works this way, or would it be feasible to adopt this sort of optimization?