idealo / imagededup

😎 Finding duplicate images made easy!
https://idealo.github.io/imagededup/
Apache License 2.0
5.16k stars 456 forks source link

In find_duplicates how can I find duplicates of a particular image vs rest I mean comparing one vs all not finding duplicates for all, only finding duplicates of one test image only #89

Closed vyaslkv closed 3 years ago

vyaslkv commented 4 years ago

Finding cosine similarity of one test image vs all other not finding for all to reduce the time I mean I only want to find duplicates of a particular image

ebernalg92 commented 4 years ago

https://github.com/idealo/imagededup/blob/master/imagededup/methods/hashing.py

Line 220

    result_set = HashEval(
        test=encoding_map,
        queries=encoding_map,      ######## 
        distance_function=self.hamming_distance,
        verbose=self.verbose,
        threshold=max_distance_threshold,
        search_method=search_method,
    )

In the line where ####### change encoding_map to particular image dictionary: keys as file names and values as encoded images (hashes)

vyaslkv commented 4 years ago

what will be the change in the case of CNN

tanujjain commented 4 years ago

@vyaslkv For CNN, you could select the indices of images for which you wish to get the duplicates in the self.cosine_scores variable (such that self.cosine_scores only has rows which correspond to your images) before using it in the loop below:

https://github.com/idealo/imagededup/blob/81d383ec0774d62439eb34ca1fab21b23d83bacd/imagededup/methods/cnn.py#L237

Haven't tested it to see if it breaks the code somewhere else, but this would be a starting point.