oconnor663 / blake2_simd

high-performance implementations of BLAKE2b/s/bp/sp in pure Rust with dynamic SIMD
MIT License
126 stars 22 forks source link

Compiling with optimisations and Zsaniter=address consumes excessive memory #20

Closed kirk-baird closed 4 years ago

kirk-baird commented 4 years ago

What is the issue?

When compiling with with opt-lvel greater than 0 and Zsaniter=address the memory from rustc continues to grow excessively. Compiling on my machine with 16Gb of RAM will exhaust all of the memory and cause the OS to kill the process.

Example

I notice this when attempting to use cargo fuzz in a repository which has this as a dependency.

To demonstrate the issue checkout this branch. Enter the directory blake2b and run (note this may consume all of your RAM):

$ cargo fuzz run fuzz_target_1 --release

Additional comments

I'm not sure if this is a bug in the compiler or what is causing this issue. It will compile fine using opt-level=0 or removing the Zsaniter=address flag.

oconnor663 commented 4 years ago

Hmm, I tried it just now on my laptop with 8 GB of RAM, and I didn't see any spike in memory usage. It's still running, but I believe the build succeeded, and I now see output like this:

INFO: Seed: 2269398421
INFO: Loaded 1 modules   (4352 inline 8-bit counters): 4352 [0x5621e7d566d2, 0x5621e7d577d2), 
INFO: Loaded 1 PC tables (4352 PCs): 4352 [0x5621e7d577d8,0x5621e7d687d8), 
INFO:        0 files found in /tmp/blake2_simd/blake2b/fuzz/corpus/fuzz_target_1
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED cov: 68 ft: 69 corp: 1/1b exec/s: 0 rss: 28Mb
#3      NEW    cov: 68 ft: 70 corp: 2/2b lim: 4 exec/s: 0 rss: 28Mb L: 1/1 MS: 1 ChangeByte-
#262144 pulse  cov: 68 ft: 70 corp: 2/2b lim: 2611 exec/s: 131072 rss: 151Mb
#524288 pulse  cov: 68 ft: 70 corp: 2/2b lim: 4096 exec/s: 104857 rss: 422Mb
#1048576        pulse  cov: 68 ft: 70 corp: 2/2b lim: 4096 exec/s: 104857 rss: 502Mb
#2097152        pulse  cov: 68 ft: 70 corp: 2/2b lim: 4096 exec/s: 99864 rss: 502Mb
#4194304        pulse  cov: 68 ft: 70 corp: 2/2b lim: 4096 exec/s: 97541 rss: 502Mb
#8388608        pulse  cov: 68 ft: 70 corp: 2/2b lim: 4096 exec/s: 92182 rss: 502Mb

I did have to switch to the nightly compiler, so the exact command I ran was:

cargo +nightly fuzz run fuzz_target_1 --release

My rustc version is rustc 1.44.0-nightly (38114ff16 2020-03-21).

oconnor663 commented 4 years ago

That said, I wouldn't be surprised if some interesting compiler options could blow up this crate. The compression function implementations (like this one targeting AVX2) involve a ton of inlining, and end up producing some very large loops in the resulting binary.

kirk-baird commented 4 years ago

That is the correct output once it has compiled.

After updating to rustc 1.44 I manged to get it to successfully compile peaking at a little over 10Gb of RAM. So I'm going to close this issue for now :)

oconnor663 commented 4 years ago

image

oconnor663 commented 4 years ago

By the way, it's awesome that you're fuzzing these crates. Please let me know what you find either way.

kirk-baird commented 4 years ago

Happy to help, the fuzzing so far was on blake2b with message sizes restricted to 32 bytes and it ran well with no panics or crashes!

oconnor663 commented 4 years ago

The most likely failures are going to be at weird buffering corner cases. Multiple calls to update() of different lengths. And the *many interfaces in different permutations.

kirk-baird commented 4 years ago

Thanks for the heads up I'll have a look at those.