nickgaya / rededup

A web extension to hide duplicate posts on pre-redesign Reddit.
https://nickgaya.github.io/rededup/
MIT License
20 stars 0 forks source link
dct image-hashing reddit web-extension

Reddit Deduplicator

Firefox: Get the add-on Available in the Chrome Web Store

A web extension to hide duplicate posts on pre-redesign Reddit.

When viewing a list of posts on reddit.com, the extension finds posts with the same URL or thumbnail and groups them together, showing only the first instance.

This is particularly useful when viewing an individual user's posts, as some users will post the same link or upload the same image to multiple subreddits. It can also be useful for viewing subscriptions or multireddits that aggregate posts from similar subreddits.

This extension is compatible with Reddit Enhancement Suite's "Never Ending Reddit" feature.

Screenshots

The extension automatically detects and hides duplicate posts.
Click the "show"/"hide" link to reveal or hide duplicates.

Technical discussion

Perceptual image hashing

To detect duplicate thumbnails, the extension uses a perceptual hash algorithm to reduce each image to a 64-bit hash. Ideally, a perceptual hash algorithm should be insensitive to minor changes in an image — visually similar images should have similar hash values. This extension implements three different hash functions:

For an interactive visualization of the different hash functions, see the perceptual hash demo.

Finding similar hashes

Although searching for exact matches works surprisingly well, in order to further reduce false negatives we group thumbnails whose hash values differ by only a few bits. To find such almost-equal hashes, we use a BK-tree, a simple data structure adapted to discrete metric spaces such as Hamming space.

Known limitations

Credits

Merge icon by Freepik from www.flaticon.com (modified from original).