There are many partial-dupes in the current archive. (no exact content-hash dupes)
Many are different versions of the same manual or doc from different sources.
This makes search results less useful.
Some form of deduping is needed. Ideas:
An internal measure of similarity like an "other versions" property array
A merge script that looks for near dupes and lets the uploader or an admin confirm, fully hiding all but the oldest dupe
A tweak to results-display which clusters by similarity + shows subcount, a la G!News
There are many partial-dupes in the current archive. (no exact content-hash dupes) Many are different versions of the same manual or doc from different sources. This makes search results less useful.
Some form of deduping is needed. Ideas: