Open jtapolczai opened 8 years ago
Rabin's fingerprinting algorithm looks good: https://en.wikipedia.org/wiki/Rabin_fingerprint
Since speed is the key consideration, we don't have to guarantee that (F(X) = F(Y)) <=> X = Y.
The two classes or errors:
We should store the hashes (or at least the names) of each file.
If a file should exist at place X in tree R but already exists in place Y (X != Y), then we should avoid copying it from another tree and just move it.
In practice, a fingerprint F(X) should be able to be computed cheaply and F(X) = F(Y) should strongly imply (or guarantee) that X = Y.
We'd have to store the hashes in a map
F(X1) -> {X1,...,Xn}
s.t.F(X1) = ... = F(Xn)
. Then, when we encounter a copy action for someFi
, we instead move aFj
(i != j).