rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.93k stars 1.57k forks source link

Secure finishing for Hasher #1768

Open Ericson2314 opened 8 years ago

Ericson2314 commented 8 years ago

With sufficiently expressive associated constants (including https://github.com/rust-lang/rust/issues/34344), it would be nice to provide scaffolding for secure hashing without allocation by extending Hasher like:

pub trait Hasher {
    const DigestSize: usize = 8;

    /// Completes a round of hashing, producing the untruncated output hash generated.
    fn untruncated_finish(&self) -> [u8; Self::DigestSize] {
        self.finish()
    }

    ...
}

[I think specialization might also be required to write that default method; I forget what syntax is needed.]

burdges commented 8 years ago

Just some thoughts :

I think finishing semantics can vary among hash functions. If I understand untruncated_finish, then it might often require coping the state, not such a big deal, but finish should not usually call untruncated_finish. And hash functions might've different behavior with respect to whether more input can be accepted after doing some output. Also, some hash functions have variable length output, like Shake for SHA3.

If you want a hash trait usable for protocols, not just local hash maps, then endianness matters. Also, it's unclear how/if a hash trait, or many writer-like traits, can really consume a usize/isize, well maybe you could map them to u64/i64, assuming they'll never exceed that value. That's okay for hashing, but a general writer traits cannot do that due to the corresponding reader trait needing to go the other way.

I'd propose using separate traits for hash function input and output, so the input trait can be very general, live in core, maybe handle issues like endianness or usize/isize consistently, etc., while the output traits could diversify slightly according to the hash function, and mostly live in crates besides core or std. At least that's what we seemingly wanted in https://github.com/DaGenix/rust-crypto/pull/333

Ericson2314 commented 8 years ago

@burdges Good points.

For split input I wish Read/Write had an associated error type, so Write<Error=!> could be used. Similarly, https://github.com/BurntSushi/byteorder/blob/master/src/lib.rs takes care of endianness for any Write and could be used for hashing too. The lack of a valid usize/isize is fine with me because anything of those types won't have cross-platform meaning anyways and shouldn't be used in a protocol.

I didn't know there was so much diversity in finalization.

burdges commented 8 years ago

I suppose SHA3's Shake output mode could be handled by returning an associated reader-like type or something, so maybe one does only need the associated type. We should therefore not need to think about output traits now, except that you need an associated type, not just an associated constant.

It's true hash functions normally cannot fail, so the error type is irksome. I know a Result<T,()> is equivalent to an Option<T> but I donno if a Result<T,!> is equivalent to just a wrapped T, which one wants for that case. It's not crazy to have a reader-like trait that cannot fail though.

Argon2 would often be configured to allocate many megabytes, or even gigabytes, of memory though, so presumably one wants an Argon2 builder function that does the allocation, and can fail, but returns an object that consumes input without the possibility of failure.

I suspect there are cleaner solutions to endianness than the ByteOrder create with slices and unwrap everywhere. I hacked up the endian-types crate here to do conversions to/from fixed length array types [u8; n]. I'll redo it more cleanly though.

I understand why reader traits do not have endianness as a type parameter, well usually they're outputting to something like JSON that does not care. I suspect hashing might benefit from making endianness a type parameter of the hash input trait.

Ericson2314 commented 8 years ago

Result<T,!> is indeed equivalent to !. That's one of the reasons ! is so useful!

I'll look at your links in more detail tomorow, but see https://github.com/jethrogb/rust-core_io/issues/3 for me trying to get maintainable better Read/Write.

burdges commented 8 years ago

At least there is an explanation for my worries around fixed size arrays in https://github.com/rust-lang/rfcs/issues/1772

burdges commented 8 years ago

I suppose the byteorder crate's tendency to specify endianness with each write or read is actually incorrect. Instead, endianness should be embedded in the type of the hasher and its builder, maybe via some wrapper hasher and/or builder that sets the endianness.

Ericson2314 commented 8 years ago

That's easily done too. With Hash <=> Write<Error=!>, it would be hard to prevent the use of edniannesss overrides, but certainly a generic wrapper type that sets the default would be easy.

burdges commented 8 years ago

I'd think hashers should take native endianness by default, given that they're frequently used for internal data structures like HashMap. If you are hashing for a protocol, then you should use a wrapper that sets the endianness by calling .to_be/.to_le everywhere.

burdges commented 7 years ago

Is there anything wrong with just adding an associated type?

pub trait Hasher {
    type Finish = u64; // Default usable by HashMap

    fn finish(&self) -> Finish;
    fn finish_mut(&mut self) -> Finish {
        finish(self)
    }
    ...
}

It might break some code around Hasher implementations, but not necessarily if the associated type can be defaulted to u64. I'd think hash functions that need multiple return types can simply implement create different types for each return type. In most cases, you must know the return type when you start hashing, so that's no biggie.

There is another issue that some hash functions cannot, or should not, copy their internal state, so you likely want this finish_mut(&mut self) -> Finish that mutably borrows the hash and wrecks it as part of finishing. We need this because modern password hashing algorithms like Argon2 hog ram. We could even add a type FinishMut = Finish; to make Finish = ! possible, but afaik anything like Argon2 that uses enough ram to want that could be configured to use little enough ram that copying remains possible. And finish could always just panic if you really should never call it.

In principle, one could maybe avoid the associated type as follows, but doing so seems worse.

pub trait HasherFinish<H> {
    fn finish(&H) -> Self;
    fn finish_mut(&mut H) -> Self;
}
pub trait Hasher {
    fn  finish<T>(&self) -> T where T: HasherFinish<Self> {
        HasherFinish::finish(self)
    }
    fn  finish_mut<T>(&mut self) -> T where T: HasherFinish<Self>{
        HasherFinish::finish_mut(self)
    }
    ...
}

Initially I was thinking about finish(&self) -> impl HasherFinish<Self> even but that's forbidden.

burdges commented 7 years ago

Just posted a little wrapper that addresses the endianness issue. Interestingly it would not work with the input_hashable approached suggested here, so maybe that's another argument for an associated type.

burdges commented 7 years ago

Appears an associated Output type was removed in https://github.com/rust-lang/rfcs/pull/823 but only coincidentally as part of simplifying the Hash trait. I donno if making Output default to u64 resolves the reservations expressed there.

If the associated type is actually painful, even with the default, then an approach like

pub trait HasherIn {
    fn write(&mut self, bytes: &[u8]);

    fn write_u8(&mut self, i: u8) { ... }
    fn write_u16(&mut self, i: u16) { ... }
    fn write_u32(&mut self, i: u32) { ... }
    fn write_u64(&mut self, i: u64) { ... }
    fn write_usize(&mut self, i: usize) { ... }
    fn write_i8(&mut self, i: i8) { ... }
    fn write_i16(&mut self, i: i16) { ... }
    fn write_i32(&mut self, i: i32) { ... }
    fn write_i64(&mut self, i: i64) { ... }
    fn write_isize(&mut self, i: isize) { ... }
}

pub trait Hasher : HasherIn {
    fn finish(&self) {.. }
}

likely restricts the breakage to the Hasher ecosystem, meaning Hashers and code like <T as Hasher>::write(..).

If nothing like that works then crypto code can always import some hashish crate that does roughly :

pub trait Digester {
    fn write(&mut self, bytes: &[u8]);

    fn write_u8(&mut self, i: u8) { ... }
    fn write_u16(&mut self, i: u16) { ... }
    fn write_u32(&mut self, i: u32) { ... }
    fn write_u64(&mut self, i: u64) { ... }
    fn write_usize(&mut self, i: usize) { ... }
    fn write_i8(&mut self, i: i8) { ... }
    fn write_i16(&mut self, i: i16) { ... }
    fn write_i32(&mut self, i: i32) { ... }
    fn write_i64(&mut self, i: i64) { ... }
    fn write_isize(&mut self, i: isize) { ... }
}

impl<T> Hasher for T where T: Digester {
    fn finish(&self) { panic!("Hasher::finish()"); }
    fn write(&mut self, bytes: &[u8])  { <T as Digester>::write(self,bytes); }

    fn write_u8(&mut self, i: u8) { <T as Digester>::write_u8(self,i); }
    fn write_u16(&mut self, i: u16) { <T as Digester>::write_u16(self,i); }
    fn write_u32(&mut self, i: u32) { <T as Digester>::write_u32(self,i); }
    fn write_u64(&mut self, i: u64) { <T as Digester>::write_u64(self,i); }
    fn write_usize(&mut self, i: usize) { <T as Digester>::write_usize(self,i); }
    fn write_i8(&mut self, i: i8) { <T as Digester>::write_i8(self,i); }
    fn write_i16(&mut self, i: i16) { <T as Digester>::write_i16(self,i); }
    fn write_i32(&mut self, i: i32) { <T as Digester>::write_i32(self,i); }
    fn write_i64(&mut self, i: i64) { <T as Digester>::write_i64(self,i); }
    fn write_isize(&mut self, i: isize) { <T as Digester>::write_isize(self,i); }
}
eddiew commented 5 years ago

+1. Restricting Hasher to u64 output makes it unusable for anything cryptography-related

burdges commented 5 years ago

Just as an aside, we're unsure exactly what a secure hasher actually looks like, much less what a generic one looks like.

Almost all cryptographic hashing interfaces like digest really suck, due to being modeled on NIST's obsolete hash function competition requirements. In particular, these interfaces encourage wasting enormous amounts of time micro-optimizing the domain separation.

At the lower level, we should abandon all existing hash function and stream cipher based interfaces in favor of mapping permutations into STROBE-like interfaces, but actually doing a zero cost interface for STROBE looks painful, certainly without const generics.

Above this, merlin currently provides the nicest interface for discrete log based signatures and zero knowledge proofs. In essence, you write traits that map your types data into hashing like calls that support good domain separation. Isolating the domain separation into tiers like this prevents micro-optimizations that waste protocol designer time.

Also STROBE-like constructions could likely "double" their rate by feeding the domain separators into hash state separately, so we should eventually outperform the legacy interfaces.

Above this, we want domain separation inferred so the interface should look more like serde, etc.

Ericson2314 commented 5 years ago

Well regardless of the algorithm, we need to be able to get more bits of information out of it.

It's been a while since I thought of this. Are enough language features stable for this?