rabidsheep / tfh-webapp

Repository for the TFH Webapp
https://tfh-webapp.web.app/
1 stars 0 forks source link

Implement duplicate detection for TFHR files #44

Open Javamorris opened 3 years ago

Javamorris commented 3 years ago

"Feebdack: Duplicate detection for youtube links and possibly TFH replay files if that's not a gigantic can of fuckin worms (which it probably is). I just tested and was able to post the same set again with the exact same URL and info.

The unique video ID is present in both YT links the site hands you, so ideally it'd also be able to tell that https://www.youtube.com/watch?v=ZBhmNYm3BMY and https://youtu.be/ZBhmNYm3BMY are the same link and reject both if it's already on the site."

https://discord.com/channels/830196457632956447/882703854884511756/884914531174871050

JAVA NOTES: This could be done by adding a "hash" field to match documents with TFHR files, but then the point of contention becomes how difficult is it to make a hash for any given TFHR file?

Javamorris commented 3 years ago

Could be important to the render farm application

Javamorris commented 3 years ago

Buttface tested the SHA-256 hash algorithm for the TFHR files, and got consistent + fast hashes. This means this is a pretty straightforward, reasonable implementation

"crc32 should work for speed, and even sha256 works extremely quickly for large batches if you want to use that as a backup if a collision occurs time for crc32 on a batch of 44 files"

https://discord.com/channels/830196457632956447/882703854884511756/884940182061723679 CRC32 image SHA256 image