Closed nbari closed 4 years ago
Are you sure you're benchmarking release mode? As in cargo run --release
? The default cargo run
is in debug mode, which will tank performance for Rust crates like this one. The ring
crate has a lot of non-Rust code in it, which could explain why it's not affected to the same extent.
As an aside, any particular reason you're calling fill_buf
and consume
explicitly, instead of just using BufReader::read
? Or for that matter, instead of std::io::copy
?
Hi @oconnor663 cargo run --release
makes a huge difference (double the speed) 💯 :
Blake
6deef7c846545544274e423b7d5bdfbf653cc4ad478a1489df056ee8c84dac47
Elapsed: 158.61ms
Sha256
dca3b9746da896f05072bdec6b788513029b26ab453b82e2e9d4365e56e2c913
Elapsed: 255.38ms
I am using fill_buf
and consume
(https://doc.rust-lang.org/std/io/trait.BufRead.html#tymethod.fill_buf) with the intention to read in chunks the file and not putting it all into memory, from my understanding is one of the best ways to prevent consuming resources, I also tested this with tokio:
use futures::stream::TryStreamExt;
use ring::digest::{Context, SHA256};
use std::error::Error;
use std::fmt::Write;
use std::time::Instant;
use tokio::fs::File;
use tokio_util::codec::{BytesCodec, FramedRead};
#[tokio::main]
async fn main() {
let now = Instant::now();
let checksum = blake("/tmp/wine.json").await.unwrap();
println!("blake: {}", checksum);
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
let now = Instant::now();
let checksum = sha256_digest("/tmp/wine.json").await.unwrap();
println!("sha256: {}", checksum);
let elapsed = now.elapsed();
println!("Elapsed: {:.2?}", elapsed);
}
async fn blake(file_path: &str) -> Result<String, Box<dyn Error>> {
let file = File::open(file_path).await?;
let mut stream = FramedRead::new(file, BytesCodec::new());
let mut hasher = blake2s_simd::State::new();
while let Some(bytes) = stream.try_next().await? {
hasher.update(&bytes);
}
Ok(hasher.finalize().to_hex().to_string())
}
async fn sha256_digest(file_path: &str) -> Result<String, Box<dyn Error>> {
let file = File::open(file_path).await?;
let mut stream = FramedRead::new(file, BytesCodec::new());
let mut context = Context::new(&SHA256);
while let Some(bytes) = stream.try_next().await? {
context.update(&bytes);
}
let digest = context.finish();
Ok(write_hex_bytes(digest.as_ref()))
}
pub fn write_hex_bytes(bytes: &[u8]) -> String {
let mut s = String::new();
for byte in bytes {
write!(&mut s, "{:02x}", byte).expect("Unable to write");
}
s
}
[dependencies] tokio = { version = "0.2", features = ["full"] } tokio-util = { version = "0.3", features = ["codec"] } blake2s_simd = "0.5.10" ring = "0.16.15"
Any advice about what could I optimize to speed up reading from the file or what blake2*
lib/method is the best to use for getting faster results? my goal, for now, is to get as fast as possible a hash from a file so that I could use it as a reference in subsequent tasks?
Thanks in advance
Using a BufReader
to avoid reading the entire file into memory is a good idea, yes. But if you look at the docs for fill_buf
, you'll see it mention that "this function is a lower-level call." In practice, only the implementer of the BufRead
trait needs to concern themselves with fill_buf
and consume
. The caller can just call read
. Trait implementations can be notoriously hard to track down in the docs, but if you look at the implementation of Read
for BufReader
, you'll see that it calls fill_buf
and consume
for you automatically. It also includes a nice optimization to skip the buffer when the read destination is very large.
my goal, for now, is to get as fast as possible a hash from a file so that I could use it as a reference in subsequent tasks
If you want the fastest hash possible, you should use BLAKE3 :)
But maybe you could help me understand what you mean by a reference. One of the tricky points about using optimized hash functions (especially BLAKE3) as a performance yardstick, is that they do a lot of interesting things with SIMD that lead to variable performance. Throughput will vary substantially across different machines depending on what SIMD instruction set extensions the machines support (SSE4.1, AVX2, AVX-512). Kind of related to that, the throughput can also vary a lot depending on the length of the input. At the risk of overwhelming you with information, take a look at figure 3 on page 9 of the BLAKE3 spec. There you can see that BLAKE2s and BLAKE2b are reasonably flat for anything longer than 1 KiB, but the curve for BLAKE3 doesn't really settle down until you're to the right of 64 KiB.
On the SHA-256 side of things, you'll also see massive performance variations now that the SHA extensions are finally hitting the consumer market, mainly in recent AMD chips and also in the very latest Intel stuff.
So anyway, this is all to say that if you want a hash function to be a stable performance yardstick for you across different machines, you might need to be very careful about what it is you're measuring. Without knowing your exact use case, it's hard for me to say more.
hi @oconnor663 many thanks, as a "reference" what I mean is to only know the "hash" (string) nothing else, my use case is to upload multiple files (backup) but I would like to know hash for each file, in where file size can vary from having as a max size 5TB
Cool, in that case benchmark it with BLAKE3 and see what happens :)
(Note that BLAKE3 is less than a year old, though, extremely recent by hash function standards. Production applications usually want to be more conservative than that with their crypto choices.)
hi @oconnor663 I just tested following your advice and is x3 faster 🥇
blake2
6deef7c846545544274e423b7d5bdfbf653cc4ad478a1489df056ee8c84dac47
Elapsed: 137.20ms
sha256
dca3b9746da896f05072bdec6b788513029b26ab453b82e2e9d4365e56e2c913
Elapsed: 231.32ms
blake3
9f15a44727fcce9f1a36dbdd222d8db80ad41030ef677d7ecf3cc8f3d30b9a1c
Elapsed: 44.39ms
I tested with:
pub fn blake3(file_path: &str) -> Result<String, Box<dyn Error>> {
let file = fs::File::open(file_path)?;
let mut reader = BufReader::new(file);
let mut hasher = blake3::Hasher::new();
let mut buf: [u8; 8192] = [0; 8192]; //chunk size (8K, 65536, etc)
while let Ok(size) = reader.read(&mut buf[..]) {
if size == 0 {
break;
}
hasher.update(&buf[0..size]);
}
Ok(hasher.finalize().to_hex().to_string())
}
Many thanks for the feedback and time on this, great stuff!
If you want to go nuts, and you have a long enough file (anything > 1 MiB is good), you can also try the multithreaded implementation of BLAKE3.
The b3sum
utility uses multithreading by default, so if you notice that it's a lot faster than your own benchmarks, that's probably why. Multithreading requires a very large buffer size to be effective, at which point it makes more sense to memory map the entire file than to use a read buffer. (The time you would spend waiting on the reader thread to fill the buffer would be a bottleneck.)
hi @oconnor663 I tested b3sum - v0.3.6
(cargo install b3sum) in a macOS Catalina 10.15.6
but while trying to get the checksum of an iso ~ 4GB after 7 minutes I got the checksum, I am using it like this:
$Â b3sum ~/Downloads/FreeBSD-12.1-RELEASE-amd64-dvd1.iso
f675a656a7f0cb0d709723021fb5046e7800675bfa2fb57d3c2ba4f1f301b73c
seems like it reads all the file into memory it used a little more than 3GB ram:
Any chance the file is on a spinning disk? We have a know performance issue with large files in that case: https://github.com/BLAKE3-team/BLAKE3/issues/31. If something like b3sum --num-threads=1
or cat $file | b3sum
performs better, the issue is probably disk thrashing.
Hi, this is the code I am using for testing: https://github.com/s3m/sandbox/blob/master/rust/blake2/src/main.rs, I don't know exactly what could I be doing wrong but for some reason, blake is running very slow:
The file I am using https://github.com/s3m/sandbox/blob/master/dataset/wine.json (<80MB)
Any idea of what could I be doing wrong?