Closed sergv closed 4 months ago
In a branch of #273, I get
hash base == hash e2 = False
hash base == hash e2 = False
So AFAICS it will resolve this issue as well.
Also, we don't need to change anything in hashable
to fix this particular problem. The issue is in instance of HashFix
, so you can write different one, e.g.
instance Hashable (f (HashFix f)) => Hashable (HashFix f) where
hashWithSalt salt x = (salt * hash x) `xor` somePrime
and it will work better.
Hashed
wasn't considered for nested hashing (nor FNV-1
to begin with AFAIK), so I'd say it's a missing feature instead of a concrete bug.
Something similar happens in in PRNG, most PRNGs are not splittable and you cannot just turn ordinary PRNG into splittable one (but if you have very good one e.g. cryptographically strong, then you can quite easily in fact). So here we'd need a hash algorithm which explicitly considers usage in Merkle-tree like applications. I'm aware only of cryptographic hashes being good for that. I'm unaware of "fast" hashes designed with Merkle-tree usage in mind.
As you cache the hash anyway, I'd consider using e.g. cryptohash-sha256
(with maybe wide-word
s Word256
for a digest) for merkle tree of your AST, and return e.g. lowest word for Hashable
instance. There's also other benefits which you can rip from that (as SHA256 is collision resistance, you can use it for fast equality, you cannot ever use hashable
for that).
I believe this issue is related to #270 but I didn’t want to hijack the discussion there.
Problem
I wrote a program that constructs ASTs where each layer stores hash of the underlying subtree. With explicit sharing the tree can get exponentially large so hash must be cached, recomputing it will take too long.
The problem is that hash for tree
Const 1
andNegate (Negate (Const 1))
is the same. It reproduces both with my hand-written wrapper that stores hash value (HashFix
) and for the wrapper that uses theHashed
type (HashedFix
).I’ve reproduced the hash computations that take place when
hashable
runs. According to my analysis the problem comes from thehashInt
function that combines hashes of subtrees to produce hash of whole node. It’s defined asFor recomputing hashes from scratch it serves well because salt gets passed around but when hash value is cached it breaks the dependency between salt and the hash value (hash is computed with default salt). Through
Hashed
the hash value goes intobytes
argument, thus successiveHashed
applications on different AST levels will lead to a computation likeIf both AST levels are the same constructor then
saltLevel1
will be equal to `saltLevel2 and they will cancel out due to xor.I reproduced detailed computations in the program, please take a look.
The full program:
Output:
Proposed solution
Switcth from FNV-1, which does this
to FNV-1a which swaps salt and bytes to be hashed:
Thus nested applications of
Hashed
won’t go into thexor
part.