mimblewimble / grin

Minimal implementation of the Mimblewimble protocol.
https://grin.mw/
Apache License 2.0
5.04k stars 989 forks source link

Caching Hash Computations #2584

Open JeremyRubin opened 5 years ago

JeremyRubin commented 5 years ago

see: https://github.com/mimblewimble/grin/pull/2567#issuecomment-464091200

Opening this issue to discuss caching computation of hashes.

For example, when we sort a list of O(N) elements by hash, we end up computing O(N Log N) cryptographic hashes of our data.

This is inefficient, so we would like to do something better -- this issue is a space to discuss potential solutions. Here are some suggestions to kick off:

1) For critical structs where we are hashing often, compute the hash on construction (assuming immutable) and store it in the struct (downside: memory usage. upside: cache efficiency, easy to code). Network serialization/de-serialization ignores this field and computes based on incoming data. 2) Store in the struct an Option<Box<Hash>>, only compute when needed. May be None forever. upside: don't use what you don't need -- only storing 1 pointer (8 bytes) v.s. 32 bytes, downside is when we do need it, having to store 40 bytes and losing locality 3) when doing std::eq, if we expect a heavily biased negative result, doing a cmp with: siphash(x) == siphash(y) && x.hash() == y.hash() the first clause can cheaply prove to us that x != y, the second clause proves that x == y. Thus we save on hashing where it's obviously not needed. 4) A wrapper type which holds hashed data -- upside: if we don't need it, don't use it. downside: would require reallocating to move items into a new vector, e.g., before sorting. Makes APIs messier 5) Careful algorithms writing -- e.g., see https://github.com/mimblewimble/grin/pull/2567. Upside: It is written such that the caching only takes up two Hashes worth of space, which is better than . Downside: it only works for particular algorithms, which are more difficult to write and review as a result. 5) stupid simple cache: a siphash driven LRU cache would let you look up elements for much cheaper with a fixed memory space. E.g., we allocate 1MB for hashes of inputs, and store (input, input.hash()) at siphash(input) % 1MB (perhaps using something like my cuckoocache if more performance needed). Upside: bounded memory, easily tunable. Downside: sharing space per hashable type requires a big enum or just having a cache table per hash cached type, ownership is annoying (RWLock can help) 6) Permutation sorting: E.g., if we have a Vec<Input>, we also have a Vec<Hash>, and we write a custom algorithm to sort Vec<Input> by Vec<Hash> as the key lookup, and also sort Vec<Hash> at the same time. Easy to drop the Vec<Hash> when no longer needed, or pass on when needed later.

antiochp commented 5 years ago

Thanks for opening an issue to track this.

ignopeverell commented 5 years ago

I'd go with 1 first, and see how KISSing it turns out.