Open dennyabrain opened 6 months ago
All the popular methods use hashing
filecmp
Audio similarity
Video Simiarity
In the short term hashing
is the way to check if files are same or not. It is also the fastest way to do so
The probability of collision is very very low
We call an event is-not-gonna-happen if it has probability <1/2^100
You can use any 512-bit cryptographic hash function like SHA-512, SHA3-512, and BLAKE2b without fear of collision. You may look at BLAKE2b quite fast compared to alternatives and its parallel version BLAKE3.
In the short term
hashing
is the way to check if same or not. It is also the fastest way to do so
Great. Then lets move onto checking if they apply for our use case. Do share the various way you test out media items received on whatsapp. To keep as a log of things that worked and which did not.
In the short term
hashing
is the way to check if same or not. It is also the fastest way to do soGreat. Then lets move onto checking if they apply for our use case. Do share the various way you test out media items received on whatsapp. To keep as a log of things that worked and which did not.
Yes I have started working on this, can you also see the updated comment with the stackoverflow link that talks about how sha256 and sha512 are collision resistant
We should consider using Blacke3 over sha512. It is much much faster
Time taken by blake2b
to find the hash of audio and video files of different lengths and sizes
Media Type - Length | Time Taken |
---|---|
audio - 30s | 0.018s |
audio - 60s | 0.027s |
audio - 120s | 0.056s |
audio - 300s | 0.122s |
audio - 600s | 0.234s |
audio - 1200s | 0.425s |
audio - 1800s | 0.631s |
Media Type - Length | Time Taken |
---|---|
video - 30s | 0.0081s |
video - 60s | 0.013s |
video - 300s | 0.022s |
video - 600s | 0.074s |
video - 1200s | 0.087s |
video - 1800s | 0.148s |
video - 3600s | 0.33s |
For every media item that we receive on the tipline, we need to show to the users how many occurences of this exact file exist on the server. Given our infra, the scope of this task is to