Closed clennan closed 4 years ago
@clennan @datitran It seems that the on_epoch_end
function (imagededup/utils/data_generator.py) is never getting called. This means that self.valid_image_files
variable still retains a list of all files as valid (including the corrupt image) and hence, the filenames
variable (imagededup/methods/cnn.py/_get_cnn_features_batch) gets the full list of files.
Still trying to figure out why on_epoch_end doesn't get triggered.
Merging #83 into dev will decrease coverage by
0.03%
. The diff coverage is100%
.
@@ Coverage Diff @@
## dev #83 +/- ##
=========================================
- Coverage 95.2% 95.17% -0.04%
=========================================
Files 17 17
Lines 647 642 -5
=========================================
- Hits 616 611 -5
Misses 31 31
Impacted Files | Coverage Δ | |
---|---|---|
imagededup/utils/logger.py | 100% <ø> (ø) |
:arrow_up: |
imagededup/methods/cnn.py | 97.89% <100%> (ø) |
:arrow_up: |
imagededup/utils/data_generator.py | 100% <100%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6f34169...586e540. Read the comment docs.
2 issues were found with the existing implementation:
on_epoch_end
call which never seems to get triggered.self.counter
to keep track of indices of corrupt files: The data generator seems to get called once initially as well in addition to the expected number calls (which should be equal to the number of batches). This initial call is made by the tensorflow function peek_and_restore. So, the variable self.counter
gets incremented many more times than expected and hence, leads to the error.Why is the error springing up now, but wasn't seen before with this permanency? - No idea
Looks great!
Currently we have failing Linux tests because we rely on the order of how images are loaded which varies across OSs. We don't care in which order images are loaded so we shouldn't test for it, that's why I removed the lines from
tests/test_hashing.py
We also have a failing test for macOS Python 3.6 on Azure pipelines which was not reproducible on my MacBook but was fixed by initialising a new CNN object for the failing test.