Closed SUPERCILEX closed 7 months ago
If you want to hash files wouldn't Digest
be better than Hash
/Hasher
? The latter is designed for hashmaps, not for arbitrary cryptographic operations.
No? I don't want to pull in a third party lib and I'm also using the same hasher to hash structs. Hasher is the easiest way to go since a u64 output is good enough for what I need.
easieast to use does not mean it is intended for that use
Sure, but this is fixating on the example. Hasher has a method for writing a byte array and the io APIs offer a nice way to compose loading and writing byte arrays which can come from anywhere including memory.
seahash includes an adapter: https://gitlab.redox-os.org/redox-os/seahash/-/blob/master/src/impl_std.rs?ref_type=heads#L5-13 xxhash includes file hashing too: https://github.com/shepmaster/twox-hash/blob/main/src/bin/hash_file.rs
All I'm hearing is "your use case is invalid, go away" which is quite frustrating. I'm clearly not the only one who wants this and I don't see how it hurts. We'd need to figure out where to put things, but at the end of the day Hasher has a method to write bytes and so does the io Write trait, so being able to compose those two traits is a pretty obvious improvement. Since those two traits can't change, I don't see how we'd regret being able to compose them as long as the adapter is in the right place.
Hasher is designed to repeatedly hash data structures for indexing. Hashing a Read
(via io::copy) consumes it which makes the hash non-reproducible.
One of the requirements of Hash
is that it is consistent with Eq
: https://doc.rust-lang.org/nightly/std/hash/trait.Hash.html#hash-and-eq
The shape happening to be right doesn't mean it's a good choice for the purpose. E.g. Result
looks a lot like Either
and yet Either
is a separate type in the ecosystem because it's semantically different.
Hash
's prefix-collision-avoidance also makes it behaviorally quite different from Digest
. I.e. feeding data in via io::copy
can have unpredictable outputs depending on how the copying chunks the data.
Sure, but this is fixating on the example.
Any line of code added is a line of code to be maintained. Individually it may be a small cost. But it still should be justified. So the justification should at least be good and consistent with the purpose of the API.
I don't think Result
vs Either
is a good comparison because that's adding a completely new type whereas this proposal is trying to compose two existing types. I still don't see how enabling the composition is bad, but I do agree that this is abusing the API somewhat.
Ideally what I want is just a simple interface that says here are some bytes, give me a hash. That doesn't make sense to add to the stdlib because it's too vague and implementation detail dependent, hence abusing Hasher
.
I wonder if https://github.com/rust-lang/libs-team/issues/133 and this proposal should just be put in an io-adapters
crate? That feels pretty silly to me, but at the same time it avoids the issue of someone coming along and complaining that the stdlib provided an adapter for Hasher
but didn't warn them that there are prefix shenanigans or other gotyas.
I went ahead and made a crate: https://docs.rs/io-adapters/0.1.0/io_adapters/. Still a little rough around the edges, but I'll clean things up and make an announcement on the fmt adapter issue once it's ready. This issue is probably moot, so closing.
Proposal
Problem statement
It would be nice to be able to use functions like
io::copy
to write into a Hasher.Motivating examples or use cases
Solution sketch
Rough sketch similar to https://github.com/rust-lang/rust/pull/104389:
Alternatives
Everybody writes their own adapters.
Links and related work
Similar to https://github.com/rust-lang/rust/pull/104389.
This is clearly a common need as someone ran into the same issue 5 years ago: https://stackoverflow.com/questions/48533445/proper-way-to-hash-a-reader-in-rust