horia141 / sdhash

Python library for image hashing and deduplication
MIT License
11 stars 1 forks source link

encoding error #4

Open under-score opened 2 years ago

under-score commented 2 years ago

Using the sample code with a probably newer Python version we get

  File ".../lib/python3.7/site-packages/sdhash/__init__.py", line 90, in _hash_image
    hasher.update('IMAGE')
TypeError: Unicode-objects must be encoded before hashing

tried to pass a base64 encoded string instead of a PIL image but then I get other errors...

horia141 commented 2 years ago

First --> thank you for trying out this thing 🙇‍♂️

Second -->

It should definitely work with just a PIL image and nothing else. I guess in newer Pythons the hash object doesn't work on strings. But rather on bytes that contain the encoded form of the string.

For example:

> 'IMAGE'.encode('utf-8')
b'IMAGE'

So the simple fix would be to update 'IMAGE' to 'IMAGE'.encode('utf-8'). But there's probably more that is broken.

Should be a simple fix, but can't promise I get around to it today. I'll try this week and keep you posted.

under-score commented 2 years ago

thank so much for the quick response - not urgent as I am working more with ssdeep (although a comparison with sdhash would be nice). Seems there are even more issues while I am late to these Python games to fix it by myself.