idealo / imagededup

😎 Finding duplicate images made easy!
https://idealo.github.io/imagededup/
Apache License 2.0
5.16k stars 456 forks source link

Illegal instruction #78

Open dz2904 opened 4 years ago

dz2904 commented 4 years ago

Run the sample program with an error

2019-12-19 22:23:22,729: INFO End: Calculating hashes! 2019-12-19 22:23:22,730: INFO Start: Evaluating hamming distances for getting duplicates 2019-12-19 22:23:22,730: INFO Start: Retrieving duplicates using Cython Brute force algorithm Illegal instruction

tanujjain commented 4 years ago

Hi

Can you please create a new conda environment and try out the example?

Also, could you please share more info?

  1. Your OS
  2. Your python version
  3. Your imagededup version
  4. Are you working in some virtual enviroment. If yes, if it's conda or virtualenv?
  5. Is the issue reproducible?
Magicloud commented 4 years ago

I have the same problem at the same place.

The environment is a Ubuntu Bionic docker container (with or without privileged), Python 3.6.9, imagededup 0.2.2, tensorflow 1.5.1 and 1.5 (old CPU without AVX and no GPU).

Tensorflow works via testing python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.constant([[1, 1, 1], [1, 1, 1]]), 0))".

The problem is reproducible for sure. Running the code sample in the README of this repo (quick start) gives the problem, always.

Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from imagededup.methods import PHash
>>> phasher = PHash()
>>> encodings = phasher.encode_images(image_dir='/root/tumblr')
2020-03-16 12:52:17,284: INFO Start: Calculating hashes...
100%|########################################| 834/834 [00:05<00:00, 153.63it/s]
2020-03-16 12:52:22,906: INFO End: Calculating hashes!
>>> duplicates = phasher.find_duplicates(encoding_map=encodings)
2020-03-16 12:52:24,408: INFO Start: Evaluating hamming distances for getting duplicates
2020-03-16 12:52:24,408: INFO Start: Retrieving duplicates using Cython Brute force algorithm
Illegal instruction

All PIPs I have:

absl-py (0.9.0)
asn1crypto (0.24.0)
bleach (1.5.0)
cryptography (2.1.4)
cycler (0.10.0)
html5lib (0.9999999)
idna (2.6)
imagededup (0.2.2)
joblib (0.14.1)
keyring (10.6.0)
keyrings.alt (3.0)
kiwisolver (1.1.0)
Markdown (3.2.1)
matplotlib (3.2.0)
numpy (1.16.6)
Pillow (6.2.2)
pip (9.0.1)
protobuf (3.11.3)
pycrypto (2.6.1)
pygobject (3.26.1)
pyparsing (2.4.6)
python-dateutil (2.8.1)
PyWavelets (1.0.3)
pyxdg (0.25)
scikit-learn (0.22.2.post1)
scipy (1.4.1)
SecretStorage (2.3.1)
setuptools (39.0.1)
six (1.11.0)
tensorflow (1.5.0)
tensorflow-tensorboard (1.5.1)
tqdm (4.43.0)
Werkzeug (1.0.0)
wheel (0.30.0)
tanujjain commented 4 years ago

The behaviour is unexpected. Could you please share a minimal Dockerfile that reproduces the problem?

Jerey commented 4 years ago

Since I ran into similar problems, here a Dockerfile with which I could reproduce the problem (quick and dirty) :

FROM ubuntu 

RUN apt-get update && apt-get install --no-install-recommends -y \ 
    python3-setuptools \
    python3 \ 
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /usr/src/app

RUN pip3 install imagededup

COPY . .
# CMD python3 main.py

Note that I excluded the CMD. The main.py is as follows:

from pathlib import Path
import imagededup
from imagededup.methods import PHash

image_dir = Path('./data/mixed_images')

phasher = PHash()
duplicates = phasher.find_duplicates(image_dir=image_dir, scores=True)

The images from image_dir-path are the ones provided in the tests: ./tests/data/mixed_images.

Both the main.py and the data/image-folder are copied into the workdir.

When starting the script, the output is as stated before:

2020-04-03 15:54:02,097: INFO Start: Retrieving duplicates using Cython Brute force algorithm
Illegal instruction (core dumped)