Open thorade opened 6 years ago
Pillow actually already has code for comparing two images, used in the test suite. I've created a PR to move this into it's own method. If you have any thoughts, they would be welcomed.
Thanks for the answer, great to hear this function will be usable for endusers, too.
My use case is described above, finding images that are identical except they have been resized and possibly some color filter has been used. Based on what I found on the internet, I wrote this Jupyter notebook: https://github.com/thorade/jupyterNotebooks/blob/master/Pillow/dhash_hamming.ipynb It creates a hash from an image, and similar images have similar hashes, purely resized images have identical hashes.
In addition to the hash from image function, the Hamming distance function shown in the notebook is also helpful.
I've hesitated merging PR #3254 because this feature request asked for a perceptual hashing function to compare functions, which if I understand, is not quite the same thing as average difference in the PR.
And by adding code to the API means we need to maintain it, which is fine if it's useful, but less so if not.
Having said that, it's useful to us as we use it in our tests, so I think I've just answered my own question!
What do others think?
I believe it makes sense to do this in two steps:
This allows to store the hashes, and it shoud be easier to compare the distance between multiple images. But I fully understand if this use case / workflow is out of scope for pillow. For my personal needs, I just implemented this myself here (as linked previously, maybe it explains my ideas better): https://github.com/thorade/jupyterNotebooks/blob/master/Pillow/dhash_hamming.ipynb
@thorade You can check out the python package imagededup that has the capability to find duplicates using perceptual hash.
For finding duplicates, it would be nice if Pillow would include some perceptual hashing algorithm: https://en.wikipedia.org/wiki/Perceptual_hashing
The real use case for me is that my in my photo collection I often have the same image in different resolutions, e.g. if I sent it via WhatsApp: Once in full resolution from the camera, once in reduced resolution in WhatsApp.
Here is a blog post describing a simple algorithm that also uses Pillow: https://www.safaribooksonline.com/blog/2013/11/26/image-hashing-with-python/ but it would be more convenient to have some function builtin, most interesting something like
image.signature
orimage.phash
and then some function to calculate the similarity or distance between two or more images.One library mentioned quite often in this context is phash: http://www.phash.org/ Maybe their algorithms could be reused?
If this is out of scope for Pillow I would just use one the projects providing phash python bindings.