Closed floriangit closed 6 months ago
The interface is a bit clunky for this.
The reason your command isn't doing what you want is that -a
applies to all the files provided on the commandline, not just the one immediately following it.
To do what you want takes two steps:
For example:
findimagedupes -f fpdb -n -- /tmp/pics/*
findimagedupes -f fpdb -t 95% -a -- FILE1.JPG
Note that FILE1.JPG
will get added to fpdb
as part of the second step. If you don't want that to happen, you can merge to /dev/null
to discard the change:
findimagedupes -f fpdb -n -- /tmp/pics/*
findimagedupes -f fpdb -M /dev/null -t 95% -a -- FILE1.JPG
Thanks for the guidance, I got it going with the two-step approach. And then comparison took milliseconds instead of minutes! :100: Great little tool to help me get in control of those 50k pictures again :+1:
BTW, if you ever touch the man-page again....
-f, --fingerprints=FILE
that I understand, but then in the description:
May be abbreviated as --fp or --db
Maybe less is more? :)
@jhnc
To do what you want takes two steps
It is not easily apparent from the manual that one needs to do 2 steps. I wrongly understood i can do:
findimagedupes -q -f /dev/shm/findimagedupes.index "/folder-with-possible-dupes/" -a "/is-this-file-duplicated.jpg"
But that does not work and from your explanation in this issue, i have also not found any mention of "-a -- file.jpg" syntax (--) is weird to Linux layman like me.
Also i have not found a mentioned/warning that the -f switch significantly speeds-up the processing.
@slrslr
--
is the POSIX norm for terminating option processing; it allows arguments starting with -
which are not options: for example, findimagedupes allows reading a filelist from stdin by specifying -
as a filename. The manpage synopsis tries to indicate this with:
findimagedupes [option ...] [--] [ - | [file ...] ]
but I see that --
is not actually explicitly described. I'll update the manpage. Thanks.
I'll try to come up with something concise to clarify that -a
applies to all files specified (note that -a
does not take any parameter). Do you have any suggestions? The current text is:
-a, --add
Only look for duplicates of files specified on the commandline.
Matches are also sought in any fingerprint databases specified.
Or perhaps adding more complex examples would be better than rewording?
-f
alone does not speed up processing directly unless the same set of files is processed multiple times. In that case, the fingerprints do not need to be recalculated.
The reason that the program runs much faster when both -a
and -f
are given is that comparing $N$ files against each other requires $O(N^2)$ comparisons but comparing $N$ files against a subset of $M$ files only needs $O(MN)$ comparisons. If $N>M$, there will be noticeable speedup, since $N^2 >> MN$. (Consider $N=10000$ and $M=10$ : on the order of only 100 thousand comparisons are needed instead of 100 million).
@jhnc
-- is not actually explicitly described. I'll update the manpage
thanks
findimagedupes allows reading a filelist from stdin by specifying - as a filename.
as an amateur Linux user, i would run NON working commands:
ls -A1 "/dir/"|findimagedupes -f /dev/shm/fpdb-nonrecur-git -t 95% -- -
findimagedupes -f /dev/shm/fpdb-nonrecur-git -t 95% -- - < ls -A1 "/dir/"
(--> ls: No such file or directory) even that directory exist
-a, --add Only look for duplicates of files specified on the commandline.
When writing about specifying the files, i am used from Linux that i specify things (paths, values) after the switch (in this case "-a"), yet you are writing "-a does not take any parameter". So i do not know if you can reword that -a switch explanation to be more clear (if yes, it can be handy), but as you have said, "complex examples" inside man page (findimagedupes -h) would be very welcome by a layman like me. Command "$ findimagedupes" explains -a option/switch only: "-a, --add" (add what.. to where) and that command "$ findimagedupes" output does not mention how to enter directory path into the command. Thank you
@slrslr It is probably best to open a different issue if you want to discuss this, since this new problem is not relevant to Florian.
The error from ... < ls ...
is not specific to findimagedupes; you would see it with any command. e.g. wc < ls
That's because the redirection operator (<
) wants a filename to read, not a program to run. However some shells (like bash) have a non-standard syntax that would allow what you intended: e.g. wc < <( ls )
(although ls | wc
is simpler)
I have read three times through the manpage, but no luck. I'm simply trying to :
# findimagedups -t 95% -a FILE1.JPG /tmp/pics/*
The result is an exhaustive duplicate search within /tmp/pics/* itself AND the dup search with FILE1.JPG. I only want the latter, so the search should be based on FILE1.JPG only, Is this possible?
Thanks!