rust-lang / libs-team

The home of the library team
Apache License 2.0
110 stars 18 forks source link

Add std::io::Write adapter for Hashers #309

Closed SUPERCILEX closed 7 months ago

SUPERCILEX commented 7 months ago

Proposal

Problem statement

It would be nice to be able to use functions like io::copy to write into a Hasher.

Motivating examples or use cases

io::copy(
    &mut File::open(entry.path()).unwrap(),
    &mut HashWriteAdapter::new(&mut hasher),
)
.unwrap();

Solution sketch

Rough sketch similar to https://github.com/rust-lang/rust/pull/104389:

struct HashWriteAdapter<'a, H> {
    hasher: &'a mut H,
}

impl<'a, H> HashWriteAdapter<'a, H> {
    pub fn new(hasher: &'a mut H) -> Self {
        Self { hasher }
    }
}

impl<'a, H: Hasher> io::Write for HashWriteAdapter<'a, H> {
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        self.hasher.write(buf);
        Ok(buf.len())
    }

    fn flush(&mut self) -> io::Result<()> {
        Ok(())
    }
}

Alternatives

Everybody writes their own adapters.

Links and related work

Similar to https://github.com/rust-lang/rust/pull/104389.

This is clearly a common need as someone ran into the same issue 5 years ago: https://stackoverflow.com/questions/48533445/proper-way-to-hash-a-reader-in-rust

the8472 commented 7 months ago

If you want to hash files wouldn't Digest be better than Hash/Hasher? The latter is designed for hashmaps, not for arbitrary cryptographic operations.

SUPERCILEX commented 7 months ago

No? I don't want to pull in a third party lib and I'm also using the same hasher to hash structs. Hasher is the easiest way to go since a u64 output is good enough for what I need.

the8472 commented 7 months ago

easieast to use does not mean it is intended for that use

SUPERCILEX commented 7 months ago

Sure, but this is fixating on the example. Hasher has a method for writing a byte array and the io APIs offer a nice way to compose loading and writing byte arrays which can come from anywhere including memory.

seahash includes an adapter: https://gitlab.redox-os.org/redox-os/seahash/-/blob/master/src/impl_std.rs?ref_type=heads#L5-13 xxhash includes file hashing too: https://github.com/shepmaster/twox-hash/blob/main/src/bin/hash_file.rs


All I'm hearing is "your use case is invalid, go away" which is quite frustrating. I'm clearly not the only one who wants this and I don't see how it hurts. We'd need to figure out where to put things, but at the end of the day Hasher has a method to write bytes and so does the io Write trait, so being able to compose those two traits is a pretty obvious improvement. Since those two traits can't change, I don't see how we'd regret being able to compose them as long as the adapter is in the right place.

the8472 commented 7 months ago

Hasher is designed to repeatedly hash data structures for indexing. Hashing a Read (via io::copy) consumes it which makes the hash non-reproducible.

One of the requirements of Hash is that it is consistent with Eq: https://doc.rust-lang.org/nightly/std/hash/trait.Hash.html#hash-and-eq

The shape happening to be right doesn't mean it's a good choice for the purpose. E.g. Result looks a lot like Either and yet Either is a separate type in the ecosystem because it's semantically different.

Hash's prefix-collision-avoidance also makes it behaviorally quite different from Digest. I.e. feeding data in via io::copy can have unpredictable outputs depending on how the copying chunks the data.

Sure, but this is fixating on the example.

Any line of code added is a line of code to be maintained. Individually it may be a small cost. But it still should be justified. So the justification should at least be good and consistent with the purpose of the API.

SUPERCILEX commented 7 months ago

I don't think Result vs Either is a good comparison because that's adding a completely new type whereas this proposal is trying to compose two existing types. I still don't see how enabling the composition is bad, but I do agree that this is abusing the API somewhat.

Ideally what I want is just a simple interface that says here are some bytes, give me a hash. That doesn't make sense to add to the stdlib because it's too vague and implementation detail dependent, hence abusing Hasher.


I wonder if https://github.com/rust-lang/libs-team/issues/133 and this proposal should just be put in an io-adapters crate? That feels pretty silly to me, but at the same time it avoids the issue of someone coming along and complaining that the stdlib provided an adapter for Hasher but didn't warn them that there are prefix shenanigans or other gotyas.

SUPERCILEX commented 7 months ago

I went ahead and made a crate: https://docs.rs/io-adapters/0.1.0/io_adapters/. Still a little rough around the edges, but I'll clean things up and make an announcement on the fmt adapter issue once it's ready. This issue is probably moot, so closing.