Allow merge / cluster / deduping

There are many partial-dupes in the current archive. (no exact content-hash dupes) Many are different versions of the same manual or doc from different sources. This makes search results less useful.

Some form of deduping is needed. Ideas:

An internal measure of similarity like an "other versions" property array
A merge script that looks for near dupes and lets the uploader or an admin confirm, fully hiding all but the oldest dupe
A tweak to results-display which clusters by similarity + shows subcount, a la G!News

prior-art-archive / priorartarchive.org

Allow merge / cluster / deduping #29