Video similarity - Githubissues

infojunkie commented 4 years ago

Tell us about your request Add a video similarity functionality to Alegre, mirroring the image similarity already in place.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? When items are added to Check, it attempts to find similar items to link them together, and thus reduce the amount of work needed by fact-checkers. To do so, Check sends the new item to Alegre. Alegre has special endpoints to ingest new media and returns matching media. Check sends "context" information including the team id and media id, in order to identify its own items that Alegre will match.

Currently, Alegre only supports text and image similarity. This request is to support video media as well.

Implementation guidelines

In Alegre, add an endpoint video/similarity in a new controller video_similarity_controller that closely mirrors image_similarity_controller and associated tests
This controller adds a core function that, given a video URL, computes a "hash" signature for the video. Additional functions receive a video URL to add the computed video hash to a PostgresQL table, and receive a video URL to query the PostgresQL table against the extract hash.
Research existing approaches to video similarity matching - for example, extract one or more keyframes and perform image matching against them or an approach like Facebook's TMK hash. The result of such computation is the "hash" above.
Add tests for the video similarity controller and other files you may be adding.

sparkingdark commented 3 years ago

Can i use python for it? Basically need to create an function takes video signature and return a hash to store in postgresql.

infojunkie commented 3 years ago

@sparkingdark yes, the service Alegre is written in Python, so please feel free to submit a PR against it. Thanks!

sparkingdark commented 3 years ago

Awesome thank you,can you tell me Am I need to use docker compose to install it ?

techytushar commented 3 years ago

hey @infojunkie I would like to work on this. I have already created endpoint for video similarity. For the video hashing part I found this repo https://github.com/sschnug/pyVideoHash which also calculates pHash but for videos, should I go ahead and implement this in the code or should I explore more options?

jossperdomo commented 3 years ago

Hello @infojunkie

I would like to work on this or any open issue in Python or JavaScript. Right now all the issue apparently are assigned :) please could you please tell me if there is an open issue in which I can work?

computermacgyver commented 3 years ago

@sparkingdark and @techytushar apologies for the silence. I'll be happy to lead correspondence on this issue going forward.

Am I need to use docker compose to install it ?

Yes, with docker compose installed you should be able to clone the Alegre repo and then use

docker-compose build
docker-compose up --abort-on-container-exit

to build and start the service. As noted in the readme, you'll likely need to up the memory limit

Update your virtual memory settings, e.g. by setting vm.max_map_count=262144 in /etc/sysctl.conf

Thanks for pointing out pyVideoHash, @techytushar . A big concern of ours is computational efficiency since we need to find matches in near real-time. It looks like pyVideoHash produces a 144-bit hash of every frame in the video. Rather than doing that I would suggest we select specific frames to hash. In the simplest case, we might just pick the 100th frame of the video and hash that. Of course, a single frame could be quite noisy; so, maybe we would hash frames 90 to 110 and average in someway.

TMK hash from Facebook provides a full video hash that would be another option. If you are both still interested in contributing, I'd recommend one of you look at using perceptual hash for a small number of frames and the other look at TMK. (We can ultimately incorporate both into the service; so, you would not be in competition with each other).

computermacgyver commented 3 years ago

@josselineperdomo I'll be very happy to help find (or create) an issue to work on. Alegre is the service that is mostly in Python but JavaScript applies to a bunch of our services (which @danielafeitosa knows more about). Is there anything particular that interests you? One service that could work well in Alegre is OCR for images. Something along the lines of this tutorial

jossperdomo commented 3 years ago

Hello @computermacgyver I'm studying deep learning in Computer Vision, I would like to work on an issue which helps me to learn more about this area. I didn't work before in any OCR systems but I checked the tutorial and I would like to work on a task related to it, please let me know how can I contribute :)

computermacgyver commented 3 years ago

Great @josselineperdomo I've created #47 for this. Can you take a look and assign yourself if this is something you'd be interested in?

meedan / check

Video similarity #11