Closed kkamila closed 7 years ago
Hello kkamila and I'm sorry for the delayed response,
I couldn't find the 'koala1.jpeg' from the blog but I downloaded a similar image from the web and the phash value for the OpenImageR package and imagehash python library is the following:
library(OpenImageR) res_hash = phash(image_koala, hash_size = 8, highfreq_factor = 4, MODE = 'hash', resize = 'bilinear') res_hash = "3bfadf09e13042c9"
import imagehash hash = imagehash.phash(Image.open(image_koala)) hash = "3bfadfa0e13042d1"
The differences of the two phashes in 4 places are due to the fact that the imagehash python library (default values of hash_size is 8 and highfreq_factor is 4) uses a different image resize method ANTIALIAS (a high-quality downsampling filter) rather than 'nearest' or 'bilinear' that the OpenImageR package offers.
please test it and let me know
Thank you. Is there any possiblity you'll add this ANTIALIAS method to your package? or maybe is there any possibility to calculate this in R ? My main concern is that the hamming distance/ similarity score of two phases is changing between your and python package.
http://res.cloudinary.com/demo/image/upload/koala1.jpg "4bb3b541ebd5141a" http://res.cloudinary.com/demo/image/upload/koala2.jpg "4bb3a541ebd614b2"
Have different symbols on 4 places what gives as similarity score 0.75
Whereas as an example(http://cloudinary.com/blog/how_to_automatically_identify_similar_images_using_phash ) we have similarity score 0.96875 it makes kind a big difference.
kkamila,
I think that if you stick with one of the programming languages such as ruby, python or R then you will get comparable results. I downloaded the following images and then I calculated the similarity between them in R. I don't know the parameter settings that the author used in the blog post but I used hash_size = 8, highfreq_factor = 6, MODE = 'binary' and resize = 'bilinear',
http://res.cloudinary.com/demo/image/upload/koala1.jpg http://res.cloudinary.com/demo/image/upload/koala2.jpg http://res.cloudinary.com/demo/image/upload/another_koala.jpg http://res.cloudinary.com/demo/image/upload/woman1.jpg
library(OpenImageR)
image = readImage("koala1.jpg") image2 = readImage("koala2.jpg") image3 = readImage("another_koala.jpg") image4 = readImage("woman1.jpg")
ham_dist = function(x1, x2) {
sum(x1 != x2) / length(x1) }
image = rgb_2gray(image) image2 = rgb_2gray(image2) image3 = rgb_2gray(image3) image4 = rgb_2gray(image4)
res_hash = phash(image, hash_size = 8, highfreq_factor = 6, MODE = 'binary', resize = 'bilinear') res_hash2 = phash(image2, hash_size = 8, highfreq_factor = 6, MODE = 'binary', resize = 'bilinear') res_hash3 = phash(image3, hash_size = 8, highfreq_factor = 6, MODE = 'binary', resize = 'bilinear') res_hash4 = phash(image4, hash_size = 8, highfreq_factor = 6, MODE = 'binary', resize = 'bilinear')
similarity = 1.0 - ham_dist(as.vector(res_hash), as.vector(res_hash2)) similarity = 0.96875
similarity1 = 1.0 - ham_dist(as.vector(res_hash), as.vector(res_hash3)) similarity1 = 0.5625
similarity2 = 1.0 - ham_dist(as.vector(res_hash), as.vector(res_hash4)) similarity2 = 0.53125
You can also consider the _invarianthash method which does also rotation, crop and flip of the image and returns min and max values for either the hamming or the levenshtein distance:
res1 = invariant_hash(image, image2, method = "phash", mode = "binary", hash_size = 8, highfreq_factor = 6, resize = "bilinear", flip = T, rotate = T, angle_bidirectional = 10, crop = T) res1
kkamila,
can I close this issue?
Hey, sorrry for not answering for so long. In python package we have hash_size =8 and highfreq_factor=4, as one can see : https://github.com/JohannesBuchner/imagehash/blob/master/imagehash/__init__.py
My main issue is that i want to rewrite few scripts from python to R, and even though i knew i can still use system call to calculate phashes in python i didn`t want to do that. I wanted every line working in R.
So just anwer me if there is any possiblity you'll add this ANTIALIAS method to your package and i`ll close the issue
kkamila,
I don't intend to implement the ANTIALIAS method in the near future.
ok, thank you for you answers
I also want to rewrite some python script fully in R and was wondering has this ANTIALIAS method been implemented, and is there a way to replicate the hashes from python in R?
@deann88,
the OpenImageR package utilizes only the 'nearest' and 'bilinear' methods and I don't intend currently to implement any other method.
Now, in case you want to implement (replicate) the hashes from the python code in R on your own then I think one option is to use the reticulate package to open and resize the image as explained in the python code
@mlampros ,
yep, I am using reticulate right now. Thank you for your answer and package. I would have used it initially, however, we already have a database of hashes to compare against, so I cannot switch at the moment.
Thanks
Hej, i tried to use your library to get phashes and calculate distances between photos. Unfortunately i get totally different hashes and distances than in example: http://cloudinary.com/blog/how_to_automatically_identify_similar_images_using_phash (while using python library i get exactly the same values as in example)