pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.82k stars 70 forks source link

How to find AND isolate/extract unique files that are in one directory but not another? #253

Open felciano opened 6 months ago

felciano commented 6 months ago

I'm trying to use fclones for a common use case of mine: confirming that a given photos directory contains, somewhere in its subdirectories, every photo that shows up in a second directory. Typically the first is a "master photo collection" folder that I've painstakingly collected and organized. At some point, I come across another folder of photos, e.g. on a thumb drive, and I think I've already got copies in the master folder, but I want to confirm. If it does turn out that there are photos in the new folder that haven't been added to the main archive yet, I want to cull those and copy them over.

Finding the unique files is straightforward:

fclones group --unique --isolate /path/to/myphotoarchive/ /path/to/somenewfolder/ > uniques.log

This gives me a file that contains files that are in somenewfolder but not in myphotoarchive, and vice-versa: files in myphotoarchive that are not in somenewfolder. As the archive grows, the latter tend to outnumber the former for any given new folder I find, but here is a simple (contrived) example:

❯ more uniques.log
# Report by fclones 0.34.0
# Timestamp: 2023-12-17 14:23:19.681 -0800
# Command: fclones group --unique --isolate '/path/to/myphotoarchive/Timeline/2022' '/path/to/somenewfolder/2022'
# Base dir: /Users/felciano
# Total: 41706067 B (41.7 MB) in 3 files in 3 groups
# Redundant: 0 B (0 B) in 0 files
# Missing: 41706067 B (41.7 MB) in 3 files
00000000000000000000000000000000, 21022045 B (21.0 MB) * 1:
    /path/to/myphotoarchive/2022/10/2022-10-09/2022-10-09 17.30.15.mov
00000000000000000000000000000000, 20034351 B (20.0 MB) * 1:
    /path/to/myphotoarchive/2022/10/2022-10-09/2022-10-09 17.30.56.mov
00000000000000000000000000000000, 649671 B (649.7 KB) * 1:
    /path/to/somenewfolder/2022/12/2022-12-31/IMG_6721.jpeg

I'd now like to find only the files that are in somenewfolder but not in myphotoarchive, and move them to a new location so I can easily review and add to myphotoarchive. In this example, the IMG_6721.jpeg file is unique and it should be called out because it is under the somenewfolder directory tree. However the two .mov files can be ignored because they are under the myphotoarchive directory tree.

I thought the --path parameter would allow this, but a dry run shows no changes would occur:

❯ fclones move --dry-run --path '**/path/to/somenewfolder/**' . <uniques.log
[2023-12-17 15:31:14.292] fclones:  info: Started deduplicating (dry run)
[2023-12-17 15:31:14.336] fclones:  info: Would process 0 files and reclaim 0 B space

~                                                                                                                                                                        at 03:31:14 PM
❯

I'm not sure why IMG_6721.jpeg wasn't identified by the move command. Is there a different way to implement this use case using fclones?

ERamseth commented 2 months ago

@felciano I have almost this exact use case. Did you get anywhere with it?

felciano commented 1 month ago

@ERamseth I wasn't able to figure out how to do this with fclones directly. I ended up having to write a script that did it in multiple steps.