idealo / imagededup

😎 Finding duplicate images made easy!

https://idealo.github.io/imagededup/

Apache License 2.0

5.16k stars 456 forks source link

Fix tests #83

Closed clennan closed 4 years ago

clennan commented 4 years ago

Currently we have failing Linux tests because we rely on the order of how images are loaded which varies across OSs. We don't care in which order images are loaded so we shouldn't test for it, that's why I removed the lines from tests/test_hashing.py

We also have a failing test for macOS Python 3.6 on Azure pipelines which was not reproducible on my MacBook but was fixed by initialising a new CNN object for the failing test.

tanujjain commented 4 years ago

@clennan @datitran It seems that the on_epoch_end function (imagededup/utils/data_generator.py) is never getting called. This means that self.valid_image_files variable still retains a list of all files as valid (including the corrupt image) and hence, the filenames variable (imagededup/methods/cnn.py/_get_cnn_features_batch) gets the full list of files.

Still trying to figure out why on_epoch_end doesn't get triggered.

codecov-io commented 4 years ago

Codecov Report

Merging #83 into dev will decrease coverage by 0.03%. The diff coverage is 100%.

@@            Coverage Diff            @@
##             dev      #83      +/-   ##
=========================================
- Coverage   95.2%   95.17%   -0.04%     
=========================================
  Files         17       17              
  Lines        647      642       -5     
=========================================
- Hits         616      611       -5     
  Misses        31       31

Impacted Files	Coverage Δ
imagededup/utils/logger.py	`100% <ø> (ø)`	:arrow_up:
imagededup/methods/cnn.py	`97.89% <100%> (ø)`	:arrow_up:
imagededup/utils/data_generator.py	`100% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6f34169...586e540. Read the comment docs.

tanujjain commented 4 years ago

2 issues were found with the existing implementation:

use of on_epoch_end call which never seems to get triggered.
use of variable self.counter to keep track of indices of corrupt files: The data generator seems to get called once initially as well in addition to the expected number calls (which should be equal to the number of batches). This initial call is made by the tensorflow function peek_and_restore. So, the variable self.counter gets incremented many more times than expected and hence, leads to the error.

Why is the error springing up now, but wasn't seen before with this permanency? - No idea

clennan commented 4 years ago

Looks great!