idealo / imagededup

😎 Finding duplicate images made easy!
https://idealo.github.io/imagededup/
Apache License 2.0
5.15k stars 455 forks source link

Memory leak when encoding with CNN? #193

Closed ddofborg closed 1 year ago

ddofborg commented 1 year ago

There seems to be a memory leak while encoding with CNN.

I encoded 10k images using CNN. The size of the images was 1000x1000. The memory usage was about 120GB. The pickled output of encodings was about 24MB. Filenames are 15 characters.

This is the code:

from imagededup.methods import CNN
hasher = CNN()
encodings = hasher.encode_images(image_dir='/images')  # 24MB pickled, 120GB in memory.
ddofborg commented 1 year ago

I had a package from pypi.org, which was outdated. It seems to be fixed in the repo.