idealo / imagededup

😎 Finding duplicate images made easy!
https://idealo.github.io/imagededup/
Apache License 2.0
5.15k stars 455 forks source link

Run cnn on gpu and fix packed tensor bug #179

Closed abrayyan closed 1 year ago

abrayyan commented 2 years ago

Hi. I just pushed a fix for the failing test. However, originally in cnn you are returning this: img_features_tensor.detach().numpy()[0, ..., 0, 0]

This returns (1 FEATURE_SIZE) and this causes cosine similarity to fail when finding duplicates because you end up with a vector of (Number of Image 1 * FEATURE_SIZE) Error:

File "/lib/python3.9/site-packages/imagededup/methods/cnn.py", line 408, in find_duplicates
    result = self._find_duplicates_dict(
  File "/lib/python3.9/site-packages/imagededup/methods/cnn.py", line 282, in _find_duplicates_dict
    self.cosine_scores = get_cosine_similarity(features, self.verbose)
  File "/lib/python3.9/site-packages/imagededup/handlers/search/retrieval.py", line 26, in get_cosine_similarity
    return cosine_similarity(X)
  File "/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 1377, in cosine_similarity
    X, Y = check_pairwise_arrays(X, Y)
  File "/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 146, in check_pairwise_arrays
    X = Y = check_array(
  File "/lib/python3.9/site-packages/sklearn/utils/validation.py", line 893, in check_array
    raise ValueError(
ValueError: Found array with dim 3. check_pairwise_arrays expected <= 2.

The push i did before fixes this error but fails the test

abrayyan commented 2 years ago

Hi, @datitran I see that my changes was replicated in another branch here https://github.com/idealo/imagededup/pull/182 , is there a reason why the changes from here was not merged directly ?

datitran commented 2 years ago

@abrayyan hey weird I seem to close the PR by accident. @tanujjain you worked on running it on GPU. Can you please review the PR.