pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.94k stars 75 forks source link

hard links treated as duplicates #268

Open gecon opened 4 months ago

gecon commented 4 months ago

Hi...

I see several hard links being grouped as duplicates, which seems like a bug. Running fclones 0.34.0

For example when running fclones group for dir "a", I get a group with the following files included:

  1. a/b/snap.1/file
  2. a/b/snap.2/file (this is a HARDLINK of 1)
  3. a/c/snap.1/file
  4. a/c/snap.2/file (this is a HARDLINK of 3)

Indeed all of the above 1-4 are the same file, but they are 2x2 hardlinks there, and fclones seems to ignore that.

Is this a bug? The examples in the documentation mention running fclones group dir1 dir2, but is the same working as expected when running fclones group on a common parent directory of dir1 and dir2 (like dir "a" above)?

Also the sizes reported when running are large, like hardlinks are not detected maybe. For example see the output below. The real size of scanned directory is about 250 GB, but several TB are reported while grouping, probably because of hardlinks.

[2024-06-01 10:37:13.887] fclones:  info: Started grouping
[2024-06-01 10:38:39.427] fclones:  info: Scanned 51478149 file entries
[2024-06-01 10:38:39.571] fclones:  info: Found 47866370 (4.7 TB) files matching selection criteria
[2024-06-01 10:38:52.509] fclones:  info: Found 46863583 (2.5 TB) candidates after grouping by size
[2024-06-01 10:39:07.117] fclones:  info: Found 46863583 (2.5 TB) candidates after grouping by paths
[2024-06-01 10:41:45.791] fclones:  info: Found 28639667 (2.2 TB) candidates after grouping by prefix
[2024-06-01 10:41:47.928] fclones:  info: Found 28639667 (2.2 TB) candidates after grouping by suffix
[2024-06-01 10:45:26.304] fclones:  info: Found 28633164 (2.2 TB) redundant files