scrubbbbs / cbird

Command-line program for managing a media collection, with focus on Content-Based Image Retrieval (Computer Vision) methods for finding duplicates.
GNU General Public License v2.0
92 stars 5 forks source link

Flipped images? #12

Open deedeenelson994 opened 5 months ago

deedeenelson994 commented 5 months ago

Hey:) I've been using your program cbird to detect similar images, and I'm incredibly impressed so far. It manages to do things a lot of paid tools do not. One thing I'm struggling with however is finding flipped images. -p.refl does not appear to work at all. The following command finds all similar images in my test folder, except any flipped images: cbird -use "F:\" -i.algos orb -update -p.alg orb -p.odt 15 -p.refl 4 -similar -show

What am I doing wrong?

scrubbbbs commented 5 months ago

Reflections are not indexed currently and must be generated as needed. It seemed like a practical choice at the time since it would 2-3x the indexing time needed.

-similar is supposed to be the fast option so it won't reprocess files (and hence reflections). -similar-to however has this capability as it was designed to find things that aren't yet a member of the index.

cbird -use <path> -p.alg dct -p.refl h+v -select-type i -similar-to @

Since each image has to be loaded and indexed for each reflection this takes a lot longer. You can specify what reflections you want and use a lighter algorithm to help that. Note that the "color" algorithm doesn't use spatial information so that would work with -similar and might be useful combined with a filter (-with score '<150') for example.

deedeenelson994 commented 5 months ago

Thanks for your response! After lots of trial and error, this appear to be the most effective command for my use case:

cbird -use -i.algos orb -update -p.alg orb -p.odt 15 -p.mg 1 -p.refl h+v -select-type i -similar-to @ -show

It works well enough (i.e. better than any other software I've tried). The only issue is mirrored images appear twice per group in the results browser. Is there a way to only make the same file show up only once per group?

Is there anything I can improve upon to increase true positives without producing too many false positives? Processing time isn't something I'm too worried about because I'm never working with too many files at a time.

scrubbbbs commented 5 months ago

Hey, happy you are finding this useful. I'm getting back into it again, maybe there is a new release this summer.

orb will find rotations, so you really only need the one reflection. -p.refl h for example; it will detect the vertical case since that is the horizontal rotated 180. That will drop half of the results ;-)

You can also force all results to be pairs with -p.eg 1, so each reflection match will be a separate group

Besides lowering the threshold you can filter by score, add -with score "<1000" after -similar-to. I don't really know what a good score would be.

A slower option is the template matcher (-p.tm 1) this checks each result pair to see if one is transformable (rotate/scale/crop) into the other and will drop most false positives. Unfortunately this is broken with reflections because it checks against the unflipped source image...new bug to fix ;-)