qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
18.39k stars 606 forks source link

Similar images: RGB threshold #425

Open reyaz006 opened 2 years ago

reyaz006 commented 2 years ago

It's a feature proposal.

I've got collections of images that may be very similar, due to the fact that one is downloaded from a website which automatically optimizes all uploaded images. I see that in many cases the difference between each pixel can vary from (0,0,0) to (1,1,1) in RGB. Thus, deduplication apps that identify 1:1 equal picture data miss those. Apps that has some kind of threshold, however, are not doing a good job because too many false positives are introduced at a minimum threshold, also I rarely see difference between low and high threshold settings.

If there would be such a setting, user might be able to set his own threshold in RGB values and be sure that they get no false positives. At the same time, setting this RGB threshold to (0,0,0) will effectively ensure that only 1:1 identical images are marked as duplicates, where needed.

So basically, when comparing 2 pixels with values (55,66,77) and (56,65,78), at threshold (0,0,0) they will be identified as different, while at threshold (1,1,1) they will be considered same.

qarmin commented 2 years ago

Most of perceptual hashes don't do take into account color of image(at least Blockhash, phash or median hash). I don't think that this is possible to implement this with current version img_hash library(feature probably would be more suitable in this library)