Open Ericson2314 opened 8 years ago
Just some thoughts :
I think finishing semantics can vary among hash functions. If I understand And hash functions might've different behavior with respect to whether more input can be accepted after doing some output. Also, some hash functions have variable length output, like Shake for SHA3. untruncated_finish
, then it might often require coping the state, not such a big deal, but finish
should not usually call untruncated_finish
.
If you want a hash trait usable for protocols, not just local hash maps, then endianness matters. Also, it's unclear how/if a hash trait, or many writer-like traits, can really consume a usize
/isize
, well maybe you could map them to u64
/i64
, assuming they'll never exceed that value. That's okay for hashing, but a general writer traits cannot do that due to the corresponding reader trait needing to go the other way.
I'd propose using separate traits for hash function input and output, so the input trait can be very general, live in core
, maybe handle issues like endianness or usize
/isize
consistently, etc., while the output traits could diversify slightly according to the hash function, and mostly live in crates besides core
or std
. At least that's what we seemingly wanted in https://github.com/DaGenix/rust-crypto/pull/333
@burdges Good points.
For split input I wish Read
/Write
had an associated error type, so Write<Error=!>
could be used. Similarly, https://github.com/BurntSushi/byteorder/blob/master/src/lib.rs takes care of endianness for any Write
and could be used for hashing too. The lack of a valid usize
/isize
is fine with me because anything of those types won't have cross-platform meaning anyways and shouldn't be used in a protocol.
I didn't know there was so much diversity in finalization.
I suppose SHA3's Shake output mode could be handled by returning an associated reader-like type or something, so maybe one does only need the associated type. We should therefore not need to think about output traits now, except that you need an associated type, not just an associated constant.
It's true hash functions normally cannot fail, so the error type is irksome. I know a Result<T,()>
is equivalent to an Option<T>
but I donno if a Result<T,!>
is equivalent to just a wrapped T
, which one wants for that case. It's not crazy to have a reader-like trait that cannot fail though.
Argon2 would often be configured to allocate many megabytes, or even gigabytes, of memory though, so presumably one wants an Argon2 builder function that does the allocation, and can fail, but returns an object that consumes input without the possibility of failure.
I suspect there are cleaner solutions to endianness than the ByteOrder create with slices and unwrap
everywhere. I hacked up the endian-types crate here to do conversions to/from fixed length array types [u8; n]
. I'll redo it more cleanly though.
I understand why reader traits do not have endianness as a type parameter, well usually they're outputting to something like JSON that does not care. I suspect hashing might benefit from making endianness a type parameter of the hash input trait.
Result<T,!>
is indeed equivalent to !
. That's one of the reasons !
is so useful!
I'll look at your links in more detail tomorow, but see https://github.com/jethrogb/rust-core_io/issues/3 for me trying to get maintainable better Read/Write.
At least there is an explanation for my worries around fixed size arrays in https://github.com/rust-lang/rfcs/issues/1772
I suppose the byteorder crate's tendency to specify endianness with each write or read is actually incorrect. Instead, endianness should be embedded in the type of the hasher and its builder, maybe via some wrapper hasher and/or builder that sets the endianness.
That's easily done too. With Hash <=> Write<Error=!>
, it would be hard to prevent the use of edniannesss overrides, but certainly a generic wrapper type that sets the default would be easy.
I'd think hashers should take native endianness by default, given that they're frequently used for internal data structures like HashMap
. If you are hashing for a protocol, then you should use a wrapper that sets the endianness by calling .to_be
/.to_le
everywhere.
Is there anything wrong with just adding an associated type?
pub trait Hasher {
type Finish = u64; // Default usable by HashMap
fn finish(&self) -> Finish;
fn finish_mut(&mut self) -> Finish {
finish(self)
}
...
}
It might break some code around Hasher
implementations, but not necessarily if the associated type can be defaulted to u64
. I'd think hash functions that need multiple return types can simply implement create different types for each return type. In most cases, you must know the return type when you start hashing, so that's no biggie.
There is another issue that some hash functions cannot, or should not, copy their internal state, so you likely want this finish_mut(&mut self) -> Finish
that mutably borrows the hash and wrecks it as part of finishing. We need this because modern password hashing algorithms like Argon2 hog ram. We could even add a type FinishMut = Finish;
to make Finish = !
possible, but afaik anything like Argon2 that uses enough ram to want that could be configured to use little enough ram that copying remains possible. And finish
could always just panic if you really should never call it.
In principle, one could maybe avoid the associated type as follows, but doing so seems worse.
pub trait HasherFinish<H> {
fn finish(&H) -> Self;
fn finish_mut(&mut H) -> Self;
}
pub trait Hasher {
fn finish<T>(&self) -> T where T: HasherFinish<Self> {
HasherFinish::finish(self)
}
fn finish_mut<T>(&mut self) -> T where T: HasherFinish<Self>{
HasherFinish::finish_mut(self)
}
...
}
Initially I was thinking about finish(&self) -> impl HasherFinish<Self>
even but that's forbidden.
Just posted a little wrapper that addresses the endianness issue. Interestingly it would not work with the input_hashable
approached suggested here, so maybe that's another argument for an associated type.
Appears an associated Output
type was removed in https://github.com/rust-lang/rfcs/pull/823 but only coincidentally as part of simplifying the Hash
trait. I donno if making Output
default to u64
resolves the reservations expressed there.
If the associated type is actually painful, even with the default, then an approach like
pub trait HasherIn {
fn write(&mut self, bytes: &[u8]);
fn write_u8(&mut self, i: u8) { ... }
fn write_u16(&mut self, i: u16) { ... }
fn write_u32(&mut self, i: u32) { ... }
fn write_u64(&mut self, i: u64) { ... }
fn write_usize(&mut self, i: usize) { ... }
fn write_i8(&mut self, i: i8) { ... }
fn write_i16(&mut self, i: i16) { ... }
fn write_i32(&mut self, i: i32) { ... }
fn write_i64(&mut self, i: i64) { ... }
fn write_isize(&mut self, i: isize) { ... }
}
pub trait Hasher : HasherIn {
fn finish(&self) {.. }
}
likely restricts the breakage to the Hasher
ecosystem, meaning Hasher
s and code like <T as Hasher>::write(..)
.
If nothing like that works then crypto code can always import some hashish crate that does roughly :
pub trait Digester {
fn write(&mut self, bytes: &[u8]);
fn write_u8(&mut self, i: u8) { ... }
fn write_u16(&mut self, i: u16) { ... }
fn write_u32(&mut self, i: u32) { ... }
fn write_u64(&mut self, i: u64) { ... }
fn write_usize(&mut self, i: usize) { ... }
fn write_i8(&mut self, i: i8) { ... }
fn write_i16(&mut self, i: i16) { ... }
fn write_i32(&mut self, i: i32) { ... }
fn write_i64(&mut self, i: i64) { ... }
fn write_isize(&mut self, i: isize) { ... }
}
impl<T> Hasher for T where T: Digester {
fn finish(&self) { panic!("Hasher::finish()"); }
fn write(&mut self, bytes: &[u8]) { <T as Digester>::write(self,bytes); }
fn write_u8(&mut self, i: u8) { <T as Digester>::write_u8(self,i); }
fn write_u16(&mut self, i: u16) { <T as Digester>::write_u16(self,i); }
fn write_u32(&mut self, i: u32) { <T as Digester>::write_u32(self,i); }
fn write_u64(&mut self, i: u64) { <T as Digester>::write_u64(self,i); }
fn write_usize(&mut self, i: usize) { <T as Digester>::write_usize(self,i); }
fn write_i8(&mut self, i: i8) { <T as Digester>::write_i8(self,i); }
fn write_i16(&mut self, i: i16) { <T as Digester>::write_i16(self,i); }
fn write_i32(&mut self, i: i32) { <T as Digester>::write_i32(self,i); }
fn write_i64(&mut self, i: i64) { <T as Digester>::write_i64(self,i); }
fn write_isize(&mut self, i: isize) { <T as Digester>::write_isize(self,i); }
}
+1. Restricting Hasher to u64 output makes it unusable for anything cryptography-related
Just as an aside, we're unsure exactly what a secure hasher actually looks like, much less what a generic one looks like.
Almost all cryptographic hashing interfaces like digest really suck, due to being modeled on NIST's obsolete hash function competition requirements. In particular, these interfaces encourage wasting enormous amounts of time micro-optimizing the domain separation.
At the lower level, we should abandon all existing hash function and stream cipher based interfaces in favor of mapping permutations into STROBE-like interfaces, but actually doing a zero cost interface for STROBE looks painful, certainly without const generics.
Above this, merlin currently provides the nicest interface for discrete log based signatures and zero knowledge proofs. In essence, you write traits that map your types data into hashing like calls that support good domain separation. Isolating the domain separation into tiers like this prevents micro-optimizations that waste protocol designer time.
Also STROBE-like constructions could likely "double" their rate by feeding the domain separators into hash state separately, so we should eventually outperform the legacy interfaces.
Above this, we want domain separation inferred so the interface should look more like serde, etc.
Well regardless of the algorithm, we need to be able to get more bits of information out of it.
It's been a while since I thought of this. Are enough language features stable for this?
With sufficiently expressive associated constants (including https://github.com/rust-lang/rust/issues/34344), it would be nice to provide scaffolding for secure hashing without allocation by extending
Hasher
like:[I think specialization might also be required to write that default method; I forget what syntax is needed.]