rust-lang / rustc-stable-hash

A stable hashing algorithm used by rustc: cross-platform, deterministic, not secure
Apache License 2.0
4 stars 3 forks source link

Consider supporting multiple hash functions #5

Open michaelwoerister opened 2 weeks ago

michaelwoerister commented 2 weeks ago

The current StableHasher implementation unconditionally uses SipHash13, which is the algorithm of choice for incremental compilation fingerprinting in the Rust compiler. However, for other use cases different hash functions with stronger guarantees, such as BLAKE3, are often preferable.

StableHasher could therefore be generic of the hash algorithm used, e.g.:

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<const W: usize>: Sized {
    fn finish(hash: [u64; W]) -> Self;
}

/// The hash algorithm used
pub trait HashImpl<const W: usize>: Hasher + Debug + Default {
    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(self) -> [u64; W];
}

/// When hashing something that ends up affecting properties like symbol names,
/// we want these symbol names to be calculated independently of other factors
/// like what architecture you're compiling *from*.
///
/// To that end we always convert integers to little-endian format before
/// hashing and the architecture dependent `isize` and `usize` types are
/// extended to 64 bits if needed.
#[derive(Debug)]
pub struct StableHasher<const W: usize, H: HashImpl<W>> {
    state: H,
}

See here for a quick and dirty POC impl: https://github.com/michaelwoerister/rustc-stable-hash/tree/generic_hash I've been using hash: [u64; W] just for simplicity. It might make more sense to make that hash: [u8; W] to get rid of any endianess ambiguities.

Now that StableHasher is available outside the compiler, it makes sense to provide different hash algorithms and to make it clear that SipHash13 is not a generically good choice.

Urgau commented 6 days ago

I really like the idea. StableHasher is already a somewhat generic.

@michaelwoerister What's the reason for using a const generic const W: usize and not a associated type?

I'm thinking of something like this, which is less verbose and more flexible.

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<H: HashImpl>: Sized {
    fn finish(hash: H::Output) -> Self;
}

/// The hash algorithm used
pub trait HashImpl: Hasher + Debug + Default {
    type Output;

    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(&self) -> Self::Output;
}

#[derive(Debug)]
pub struct StableHasher<H: HashImpl> {
    state: H,
}
michaelwoerister commented 5 days ago

Yes, an associated type makes more sense than a generic parameter.

That StableHasherResult would refer to a specific HashImpl seems a bit unfortunate though.

Urgau commented 5 days ago

And it doesn't even need to be the case. StableHasherResult could just directly take the hash type as a generic or associated type. Which seems even cleaner.

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<H>: Sized {
    fn finish(hash: H) -> Self;
}

/// The hash algorithm used
pub trait HashImpl: Hasher + Debug + Default {
    type Output;

    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(self) -> Self::Output;
}

#[derive(Debug)]
pub struct StableHasher<H: HashImpl> {
    state: H,
}

impl<H: HashImpl> StableHasher<H> {
    pub fn finish<W: StableHasherResult<H::Output>>(self) -> W {
        W::finish(self.state.finish())
    }
}