Closed hans-helmut closed 2 years ago
I think you can use --cache
to avoid it.
Anyway, this is a known issue. This started happening after I removed the code that prunes the hardlinks early in the process, so that all hardlinks can be reported in the output. However, I need to add some in-memory caching layer to avoid hashing the files we already hashed once.
I think you can use
--cache
to avoid it. Well, not on the first run:me@pc:~/tmp$ rm -rf ~/.cache/fclones/ me@pc:~/tmp$ time fclones group a b [2022-06-13 17:32:42.201] fclones: info: Started grouping [2022-06-13 17:32:42.205] fclones: info: Scanned 6 file entries [2022-06-13 17:32:42.205] fclones: info: Found 4 (42.9 GB) files matching selection criteria [2022-06-13 17:32:42.205] fclones: info: Found 3 (32.2 GB) candidates after grouping by size [2022-06-13 17:32:42.205] fclones: info: Found 3 (32.2 GB) candidates after grouping by paths and file identifiers [2022-06-13 17:32:42.209] fclones: info: Found 3 (32.2 GB) candidates after grouping by prefix [2022-06-13 17:32:42.216] fclones: info: Found 3 (32.2 GB) candidates after grouping by suffix [2022-06-13 17:32:54.937] fclones: info: Found 3 (32.2 GB) redundant files # Report by fclones 0.25.0 # Timestamp: 2022-06-13 17:32:54.939 +0200 # Command: fclones group a b # Base dir: /home/me/tmp # Total: 42949672960 B (42.9 GB) in 4 files in 1 groups # Redundant: 32212254720 B (32.2 GB) in 3 files # Missing: 0 B (0 B) in 0 files 32d0c7c0740d7b71703c5df2f89dce3d, 10737418240 B (10.7 GB) * 4: /home/me/tmp/a/1 /home/me/tmp/a/2 /home/me/tmp/b/1 /home/me/tmp/b/2
real 0m12,759s user 0m3,318s sys 0m12,767s me@pc:~/tmp$ time fclones group --cache a b [2022-06-13 17:33:03.823] fclones: info: Started grouping [2022-06-13 17:33:03.846] fclones: info: Scanned 6 file entries [2022-06-13 17:33:03.846] fclones: info: Found 4 (42.9 GB) files matching selection criteria [2022-06-13 17:33:03.846] fclones: info: Found 3 (32.2 GB) candidates after grouping by size [2022-06-13 17:33:03.847] fclones: info: Found 3 (32.2 GB) candidates after grouping by paths and file identifiers [2022-06-13 17:33:03.850] fclones: info: Found 3 (32.2 GB) candidates after grouping by prefix [2022-06-13 17:33:03.850] fclones: info: Found 3 (32.2 GB) candidates after grouping by suffix [2022-06-13 17:33:18.567] fclones: info: Found 3 (32.2 GB) redundant files
32d0c7c0740d7b71703c5df2f89dce3d, 10737418240 B (10.7 GB) * 4: /home/me/tmp/a/1 /home/me/tmp/a/2 /home/me/tmp/b/1 /home/me/tmp/b/2
real 0m14,768s user 0m3,632s sys 0m14,287s me@pc:~/tmp$ time fclones group --cache a b [2022-06-13 17:33:26.262] fclones: info: Started grouping [2022-06-13 17:33:26.285] fclones: info: Scanned 6 file entries [2022-06-13 17:33:26.285] fclones: info: Found 4 (42.9 GB) files matching selection criteria [2022-06-13 17:33:26.286] fclones: info: Found 3 (32.2 GB) candidates after grouping by size [2022-06-13 17:33:26.286] fclones: info: Found 3 (32.2 GB) candidates after grouping by paths and file identifiers [2022-06-13 17:33:26.289] fclones: info: Found 3 (32.2 GB) candidates after grouping by prefix [2022-06-13 17:33:26.290] fclones: info: Found 3 (32.2 GB) candidates after grouping by suffix [2022-06-13 17:33:26.292] fclones: info: Found 3 (32.2 GB) redundant files
32d0c7c0740d7b71703c5df2f89dce3d, 10737418240 B (10.7 GB) * 4: /home/me/tmp/a/1 /home/me/tmp/a/2 /home/me/tmp/b/1 /home/me/tmp/b/2
real 0m0,047s user 0m0,006s sys 0m0,063s me@pc:~/tmp$
Hello,
files which are already hardlinked do not need to read twice ore more. Expected in this test is, that 20 GB are read, not 40 GB. Patterns like this are caused by backups using cp and rsync, where hundreds of hardlinks are not uncommon.