jhnc / findimagedupes

Finds visually similar or duplicate images
GNU General Public License v3.0
104 stars 8 forks source link

NAME

findimagedupes - Finds visually similar or duplicate images

SYNOPSIS

findimagedupes [option ...] [--] [ - | [file ...] ]

Options:
   -f, --fingerprints=FILE    -c, --collection=FILE
   -M, --merge=FILE           -p, --program=PROGRAM
   -P, --prune                -s, --script=FILE
   -a, --add                  -i, --include=TEXT
   -r, --rescan               -I, --include-file=FILE
   -n, --no-compare
                              -q, --quiet
   -t, --threshold=AMOUNT     -v, --verbosity=LIST

   -0, --null                 -h, --help
   -R, --recurse                  --man
                                  --version

With no options, compares the specified files and does not use nor update any fingerprint database.

Directories of images may be specified instead of individual files; Sub-directories of these are not searched unless --recurse is used.

INSTALLATION

If you use linux, your distribution may include a prepackaged version. For example, Debian and Ubuntu do.

Otherwise, at a minimum you'll need Perl with the modules listed at the top of the findimagedupes script. Also the GraphicksMagick package.

You may need to change Inline's DIRECTORY to point somewhere else. Read the Inline module documentation for details.

OPTIONS

DESCRIPTION

findimagedupes compares a list of files for visual similarity.

RETURN VALUE

Any other return values indicate an internal error of some sort.

DIAGNOSTICS

To be written.

EXAMPLES

FILES

To be written.

BUGS

There is a memory leak somewhere.

Killing the program may corrupt the fingerprint database(s).

The program does not lock the fingerprint database although concurrent write access to it is unsafe.

GraphicsMagick does not expose its auto-orient functionality to Perl.

Changing version of GraphicsMagick invalidates fingerprint databases.

NOTES

Directory recursion is deliberately not implemented: Composing a file-list and using it with - is a more flexible approach.

Repetitions are culled before comparisons take place, so a commandline like findimagedupes a.jpg a.jpg will not produce a match.

The program needs a lot of memory. Probably not an issue, unless your machine has less than 128MB of free RAM and you try to compare more than a hundred-thousand files at once (and the program will run quite slowly with that many files anyway---about eight hours initially to generate fingerprints and another ten minutes to do the actual comparing).

Fingerprinting images is a bottleneck but unfortunately the program was not written with parallel processing in mind. For a workaround, see: https://github.com/jhnc/findimagedupes/issues/9

SEE ALSO

find(1), md5sum(1)

gqview - GTK based multiformat image viewer

gthumb - an image viewer and browser for GNOME

AUTHOR

Jonathan H N Chin code@jhnc.org

COPYRIGHT AND LICENSE

Copyright © 2006-2022 by Jonathan H N Chin <code@jhnc.org>.

This program is free software; you may redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

HISTORY

This code has been written from scratch. However it owes its existence to findimagedupes by Rob Kudla and uses the same duplicate-detection algorithm.