Open estatistics opened 1 year ago
The default output should already have each line as a group of "similar" images.
To do something more complicated, you can use the -s
option and/or customise the VIEW
function.
For example to output each group with one file per line, and ending with an extra newline, you could do:
findimagedupes -R -q -t 70% -i 'VIEW()( printf "%s\n" "$@"; echo )' -- .
or:
findimagedupes -R -q -t 70% -s script -- .
and then edit the shell script script
that is created.
The -v md5
option is really just a debugging aid left over from when I wrote the program.
Running md5sum
would be faster.
The hash does identify files that have completely identical content (not just the pixel data), but there currently isn't a way to easily display how similar two non-identical images are.
To do this would require the program to output pairs of images individually and not as groups. I intend to add this option in the next release.
( Currently the program always merges pairs into bigger groups. Unfortunately, the algorithm is based on an assumption that fails when there are many files. In that situation, it is extremely likely the program will output at least one huge group that contains many dissimilar images. See: https://github.com/jhnc/findimagedupes/issues/12#issuecomment-1610905081 )
Currently the database is Berkeley DB.
You can display the content of the fingerprint database using something like:
perl -MMIME::Base64 -MDB_File -E '
if ( tie %h,DB_File => $ARGV[0] ) {
say( encode_base64($v,""), " $k" ) while ($k,$v) = each %h
}
' your-fingerprints.db
Switching to sqlite is on my wishlist as it may solve some problems and simplify implementing some other requested features.
I have found the solution in your problem ( always merges pairs into bigger groups). When i put similarity 70% i get 20-100 pics in same line When i increase it to 90% i get 2-3 pairs per line
So you can insert an option to increase similarity when pairs exceed eg. 10
findimagedupes --pairsexceed 10
(increase similarity by eg. 5%.
I would like to know how i can extract the results in a csv or text file, with every similar files separated with a new line?
I know that a1 and a2 are exact matches as b1 and b2. How i can understand this or group them using findimagedupes per similarity?
and fp_data how it can be opened? it is a sql database?