implement Hash.dup which copies the hash state - Githubissues

mirleft / ocaml-nocrypto

OCaml cryptographic library

ISC License

111 stars 53 forks source link

implement Hash.dup which copies the hash state #125

Closed hannesm closed 7 years ago

hannesm commented 7 years ago

There are several protocols (e.g. tor) which require to (a) finalize the hash and transmit parts of it, and (b) continue to feed the hash function for the next message.

Other applications, such as OpenPGP, as well require you to hash data, take some bytes, hash some more, take some more bytes, hash some more, and finally sign the hash.

pqwy commented 7 years ago

Is there any use case that is not covered by this representation?

I.e. is there any use-case for multi-way forking, such that both hashes of [A; B; C] and [A; B; D] are needed?

pqwy commented 7 years ago

In fact, taming get is tons simpler. The time overhead doesn't even register on 10^7 iterations over 16-byte inputs.

Is there much raw feed usage in the wild? Making that referentially transparent too would fix something I've long hated about the API.

hannesm commented 7 years ago

thank you, I have not seen any use of multi-way forking. get looks very sensible!

pqwy commented 7 years ago

Still; I'm digging a bit through various clients. Is there a significant Hash.S.feed usage in the wild, as opposed to Hash.S.digest (and digestv)? Do people manually maintain the state over long periods?

hannesm commented 7 years ago

I have (unpublished) code which uses feed - because the input data is more than memory size (this is also what @cfcs uses in openpgp).

pqwy commented 7 years ago

How large are the chucks, typically?

hannesm commented 7 years ago

this depends, between 512 bytes, MTU size (1500 bytes) for some protocols -- but for a local file on disk rather 4k or even 64k

pqwy commented 7 years ago

Ok, let's have the GC take one for the team.

Heads-up: Hash.S.t is now immutable. This is a breaking change -- all uses of Hash.S.feed now need to store the new state themselves.

With this, hashing histories support arbitrary prefix-sharing. That, and there is one less mutable abstract type forcing me to scrub my hands again and again and again.

The cost is about 150 (88, 96, 168 or 208) bytes allocated per feed. Since the state now lives in the young generation, this pans out much better than it sounds. The speed hit barely registers; it starts floating at around 5% when feeding about 50000 150-byte chunks in a busy loop, which is an unrealistic level of fragmentation.

Still, I recommend looking at the documentation in the header. When there are multiple data chunks that can be made available during a single call, feedi/feedv and digesti/digestv can process them without incurring extra allocations, unlike repeated calls to feed. This does not apply if multiple chunks are coming from the network while using a concurrency monad, but I reckon that the rest of the protocol computation plus the sheer latency between protocol events will make this overhead invisible.