Dedup - Githubissues

In the rare case of a hash collision we could use the same strategy as a hash table:

Think of the destination as a collection rather than a singleton, then:

if collection is empty:
  add file to collection
else
  compare file to existing members until first match or end of collection
  if something matched:
    skip adding file (i.e. we already have a copy of it)
  else
    add file to collection

Collection could be implemented as a sequence number appended to the filename:

<date>--<time>--<digest>--<sequence>[.<extension>]

This will let us confidently use fast hashing functions, such as xxHash (#13).

Trouble is that now the filename will depend on the order in which phorg was applied, so we still need a way to make it deterministic.

A deterministic alternative is to keep escalating the strength of the hash function until collision doesn't occur, appending each used digest to the final filename.

xandkar / phorg

Dedup #6