Introduce limits: Excessive memory allocation for small inputs

dbrgn commented 4 years ago

Found this via fuzzing (see #131).

First of all, I'm not totally sure how to interpret these results.

For the following test input (1.4 KiB):

id:000004,sig:06,src:000000,op:flip32,pos:874.tar.gz

...the afl fuzzer reported a crash. When running the reproducer on it, however, it runs fine:

$ cargo run --bin reproduce_decode out/crashes/id\:000004\,sig\:06\,src\:000000\,op\:flip32\,pos\:874
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/reproduce_decode 'out/crashes/id:000004,sig:06,src:000000,op:flip32,pos:874'`
Decoder returned an error: Format("failed to decode huffman code")
Note: Not a panic, this is fine.

$ cargo run --release --bin reproduce_decode out/crashes/id\:000004\,sig\:06\,src\:000000\,op\:flip32\,pos\:874
   Compiling fuzz-target-jpeg-decoder v0.1.0 (/home/danilo/Projects/jpeg-decoder/fuzz-afl)
    Finished release [optimized] target(s) in 2.25s
     Running `target/release/reproduce_decode 'out/crashes/id:000004,sig:06,src:000000,op:flip32,pos:874'`
Decoder returned an error: Format("failed to decode huffman code")
Note: Not a panic, this is fine.

By default, AFL uses a 50 MiB memory limit, so this was likely exceeded.

When running the target through valgrind with --pages-as-heap=no, the program quickly fills my system memory (roughly 20 GiB free) and is then killed by the OOM killer.

$ valgrind --tool=massif --pages-as-heap=no --massif-out-file=massif.out target/release/reproduce_decode out/crashes/id\:000004\,sig\:06\,src\:000000\,op\:flip32\,pos\:874
==1046640== Massif, a heap profiler
==1046640== Copyright (C) 2003-2017, and GNU GPL'd, by Nicholas Nethercote
==1046640== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1046640== Command: target/release/reproduce_decode out/crashes/id:000004,sig:06,src:000000,op:flip32,pos:874
==1046640==
Killed

When running the target through valgrind with --pages-as-heap=yes, I get the following massif report: massif.out.tar.gz When opening it with massif-visualizer, it reports a peak virtual memory usage of 24 GiB.

I'm not entirely sure if that is a false positive by valgrind, since just running the target without it returns an error without filing system memory, however something is probably still not right with the parser. I hope this issue is still helpful, maybe someone else can figure out what's going on :slightly_smiling_face:

This SO question may be helpful: https://stackoverflow.com/questions/52532893/understanding-linux-virtual-memory-valgrinds-massif-output-shows-major-differe

kaj commented 4 years ago

The image claims to be 65531x65531 pixels, so I assume a large part of the problem is allocating a buffer for that (16 GiB, assuming 4 bytes per pixel) before checking that the file is actually decodable.

The generic ability to limit image sizes or work with user-provided buffers that has been suggsted would fix this. I don't see any other obvious way.

dbrgn commented 4 years ago

If we know the input file size, can we know in advance whether the number of pixels is realistic or not? If yes, maybe two separate APIs could be provided, one streaming API (where it accepts any R: Read plus maybe a size limit) and one function that only accepts data with known size?

Maybe there could be other sanity checks being done before preallocating the buffer?

Unfortunately I'm not familiar with the JPEG internals, so I can't really judge the feasibility of those ideas...

kaj commented 4 years ago

I think a good solution to https://github.com/image-rs/image/issues/938 is what is needed to avoid this problem.

image-rs / jpeg-decoder

Introduce limits: Excessive memory allocation for small inputs #133