multiformats / rust-multihash

multihash implementation in Rust
https://github.com/multiformats/multihash
Other
150 stars 60 forks source link

No elegent way to stream hash given a hash code #141

Open CBenoit opened 2 years ago

CBenoit commented 2 years ago

Hi,

In previous multihash version, we used to be able to compute the digest in a streamed manner using MultihashDigest::input and it was possible to get a boxed MultihashDigest given a multihash. I currently see no way of doing the same, which is an issue in some use cases.

For example, I need to validate a digest computed from a file. Since the file can be big, I want to use the new StatefulHasher trait. However, I found no way to get a trait object.

Here’s my code:

pub fn validate_file_checksum(expected_digest: &str, file_path: &Path) -> std::io::Result<bool> {
    let (_, hash_data) = multibase::decode(expected_digest).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
    let expected_digest = Multihash::from_bytes(&hash_data).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
    let hash_code = multihash::Code::try_from(expected_digest.code()).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;

    // FIXME: multihash new API is breaking this code for streaming hashing (checked for version 0.14)
    //
    //const BUF_SIZE: usize = 1024 * 128;
    //let file = File::open(file_path)?;
    //let mut reader = BufReader::with_capacity(BUF_SIZE, file);
    //
    //let hasher = todo!("get an appropriate trait object hasher given the hash code");
    //
    //loop {
    //    let length = {
    //        let buffer = reader.fill_buf()?;
    //        hasher.update(buffer);
    //        buffer.len()
    //    };
    //    if length == 0 {
    //        break;
    //    }
    //    reader.consume(length);
    //}
    //
    //let digest_found = hasher.finalize();
    //
    // So instead, we read the whole file in memory:

    let file_content = std::fs::read_to_string(file_path)?;
    let digest_found = hash_code.digest(file_content.as_bytes());

    Ok(expected_digest == digest_found)
}

If I overlooked something, please let me know!

Thank you

mriise commented 2 years ago

It is a bit confusing as both Hasher and StatefulHasher implement Default, but you are explicit about what you want Rust will give it to you.

let hasher: StatefulHasher = Identity256::default();

hopefully this works for you :)

CBenoit commented 2 years ago

Hi :slightly_smiling_face:

Thank you for the answer, but this is not what I’m looking for. I need to get a hasher from a hash code I can’t know ahead of time (see my snippet above). The issue is precisely that we can’t use StatefulHasher except when using a specific algorithm known at compile-time like you mentioned, which kind of defeat the purpose of multihash to some extend :/ The new API is very nice when using digest is acceptable though!

vmx commented 2 years ago

I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a StatefulHasher based on the Code, as the StatefulHashers depend on specific Digests (please correct me if I'm wrong).

I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the Code enum, which implements the StatefulHasher functionality for all the Codes. That struct would that returned by a Code::hasher() call. I'm not sure if that would work, but it might be worth a try.

CBenoit commented 2 years ago

I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a StatefulHasher based on the Code, as the StatefulHashers depend on specific Digests (please correct me if I'm wrong).

Exact!

I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the Code enum, which implements the StatefulHasher functionality for all the Codes. That struct would that returned by a Code::hasher() call. I'm not sure if that would work, but it might be worth a try.

This would be really helpful! However, it might not be very straightforward because StatefulHasher has associated types and implementing structs are using different types (because different digest size).

vmx commented 2 years ago

(because different digest size).

When you derive a Mutlihash via #[derive(Multihash)], all digests should have the same size. So at least that part should work (others may not ;)