toy / pHash

Ruby interface to pHash
http://phash.org/
GNU General Public License v3.0
60 stars 16 forks source link

Hashes changing each time you run #11

Closed deepimpactmir closed 7 years ago

deepimpactmir commented 7 years ago

puts a.compute_phash

1st time I run

2nd time I run

Not really familiar with ruby so pardon me if mistakes are made

toy commented 7 years ago

Those numbers are not hashes, but ruby object ids, sort of memory pointers, which don't say anything about contents of objects and are valid only during the run of ruby instance. Currently there is no public interface for viewing phash representation.

toy commented 7 years ago

I don't see here your last comment, but I've added few commits with last one allowing to view contents of hashes (you can view it by running puts a.phash). While doing this I've noticed that video hashes are not working as previously (resized test videos are not considered similar).

Overall, unfortunately the last release of the pHash library itself was three years ago and my attempt to submit patches was not replied. There are forks which fix few issues, but are not taking over the maintenance.

deepimpactmir commented 7 years ago

Mmm. I tried running puts a.phash But I don't think it's the hash that it's returning #<Phash::VideoHash:0x00000001b901f8 @data=#<FFI::Pointer address=0x000000019d2e30>, @length=1>

Anyways, I've added a cout in the pHash library itself to output the hashes and they're not changing so I think there's no issue there.

Can you elaborate more on what you mean by "resize"?

If you dont mind, what patches have you submitted?

toy commented 7 years ago

By resized videos I mean those in spec/data: while all combinations of mouse-120.mp4, mouse-150.mp4 and mouse-180.mp4 compare successfully, comparison of jug-120.mp4 with jug-150.mp4 and jug-180.mp4 fails, while should succeed.

The patches I've sent were to apply extern "C" on audio hash functions and to fix image hashing for RGBA images. Though there are more known things to fix.

deepimpactmir commented 7 years ago

Yea. Definitely. Ran some tests myself and got the same conclusion as you.

Is there any correlation between keyframes and the similarity score?

toy commented 7 years ago

Video hash is built from keyframes, but I did not look into why the hash currently seem to be built from only one frame.

deepimpactmir commented 7 years ago

Mmm. What do you mean one frame? As in only one frame is used to compare the other keyframes?

toy commented 7 years ago

I was trying to find time to properly check if algorithm works correctly, but did not yet. By one key frame I mean that those videos which have more than one key frame each, create a 64 bit hash, so for only one frame (or I'm missing something in later algorithm steps).

I've also did not reply about puts a.phash which can simplify debugging, I've added the change to the master branch, but did not release the gem. You can check it out and either add lib folder to ruby load path or run gem build pHash.gemspec && gem install pHash-*.gem.

deepimpactmir commented 7 years ago

Oh ic, so you're saying that one key frame would have a 64 bit hash? Then 2 key frames should have 128 bit hash? Pardon me if I misinterpreted you.

I didn't really look through the algorithm but from what I've seen, it's forming a matrix of DCT coefficients or something like that. Not really sure.

Will check out the new function you've implemented.

toy commented 7 years ago

@DuelToDeath What are your findings?

deepimpactmir commented 7 years ago

I think we can close this issue. I've done some experimentation with the perceptual hash. Not sure if you're interested in some of the graphs I generated.

Thanks.

toy commented 7 years ago

Why not? Nothing conclusive?

deepimpactmir commented 7 years ago

Well, I would say it works pretty well. I tested it with a database of 100 videos. The hashes generated is pretty robust.

reduced-res-1kf

So I basically modified one of the videos, in this case, reducing the resolution of it, then comparing it to the 100 videos. Ranked the similarity scores I obtained to see how the robust the hash is. Then, repeating it for all the 100 videos.

It did pretty well, with a median of around 1.5, so the correct video is ranking quite consistently at the top.

toy commented 7 years ago

Nice to get the confirmation that it works. Though What is meant by "Percentage of pixels left"? Different reductions of resolution?

deepimpactmir commented 7 years ago

Yep that's right

toy commented 7 years ago

I think this can then be closed