Closed snoopy83101 closed 4 years ago
Hi @snoopy83101
Increasing "bits" parameter you should do the trick. Please try this:
imghash
.hash(o.path, 8, "binary")
.then((hash) => {
resolve(hash);
})
.catch((e) => {
reject(e);
});
It will return a longer hash with a higher resolution. Hashes for the two images should no longer be the same either.
@pwlmaciejewski Thanks for the reply, if the length is 8, the two pictures are indeed returned differently. But if I have ten thousand pictures, does he still have the same probability?
@snoopy83101 It depends heavily on the similarity of the pictures. If not having collisions is your priority then use a very high bit length, eg. 256
and more. Collisions are still possible since you can never rule them out, but with long hashes, you should be relatively safe.
@snoopy83101 It depends heavily on the similarity of the pictures. If not having collisions is your priority then use a very high bit length, eg.
256
and more. Collisions are still possible since you can never rule them out, but with long hashes, you should be relatively safe.
@pwlmaciejewski hi, I also want to ask about the probability of repetition
imghash.hash(o.path, 12, "binary"): 111111100011111110000000111100000000101110010000011100000111111010000111100000000110100111111110100111000110110010000111011000000111111000001111
imghash.hash(o.path): f884c4d8d1193c07
Which one has the greater probability of repetition?
Which one will consume more system resources?
@snoopy83101
Both take a similar amount time:
# imghash.hash(o.path, 12, "binary")
time sh -c 'for i in {1..200}; do imghash -b 12 -f binary Lenna.png > /dev/null; done;'
sh -c 83,74s user 5,10s system 286% cpu 31,044 total
# imghash.hash(o.path)
time sh -c 'for i in {1..200}; do imghash Lenna.png > /dev/null; done;'
sh -c 'for i in {1..200}; do imghash Lenna.png > /dev/null; done;' 84,06s user 5,31s system 297% cpu 30,002 total
The amount of consumed resources depends on the images you process so both should take a similar amount as well.
As for the probability of collision, please refer to the blockhash algorithm page.
For images in general, the algorithm generates the same blockhash value for two different images in 1% of the cases (data based on a random sampling of 100,000 images).
For photographs, the algorithm generates practically unique blockhashes, but for icons, clipart, maps and other images, the algorithm generates less unique blockhashses. Larger areas of the same color in an image, either as a background or borders, result in hashes that collide more frequently.
http://file.geeknt.com/upload/20200612/4a5a7f5c-5417-4c53-bf1c-23dcf2b12c24.jpg http://file.geeknt.com/upload/20200612/3512a09d-4f81-4518-9391-a253e631657f.jpg
imghash .hash(o.path, 4, "binary") .then((hash) => { resolve(hash); //1100110011001100 }) .catch((e) => { reject(e); });
why?
I want to make them unique, and when generating a hash, don't take up too much system resources, how can I do it?