rhsimplex / image-match

🎇 Quickly search over billions of images
2.94k stars 405 forks source link

Duplicate Identifier or Similarity Identifier #62

Closed alexminnaar closed 7 years ago

alexminnaar commented 7 years ago

I was hoping to get some clarification on the intended use-case for this. Should it strictly be used for duplicate detection or can it also be used to identify similar images. This page seems to suggest that it can be used to measure image similarity. However when I try it on the attached images, it does not seem to agree with the intuition that two images of shoes should be significantly more similar than an image of a shoe and something else.

68510a677540a15fdeeafad1ff381e250653e27f 88544b108b4e80844e2e43d48d21db8f99506dc9 fefd7feddf373423c20ea759c0a290003325372a

The distance between the first and second image seems to be 0.71422605625006175 but the distance between the first and third is 0.70043762770711271.

rhsimplex commented 7 years ago

Hi @alexminnaar yes, the intended use is for near-duplicate images. The original use case was detection of copyright violation over a corpus of a billion+ images.

Here's a video from pydata explaining more.

Sorry about the confusion, I'll link this issue from the README to help others.

ghost commented 7 years ago

@alexminnaar if you are interested You can use something like this https://github.com/akshayubhat/DeepVideoAnalytics