various testcases that crash tgunzip

jepler commented 6 years ago

The testcases are all found by afl-fuzz. I am happy to tell you more about my fuzzing process if you'd like. Due to my testing methodology, there may be duplicates among the testcases, both in the sense of exact duplicate inputs and in the sense of testcases that are not identical inputs but cause "the same" crash.

Typical crash:

$ examples/tgunzip/tgunzip tgunzip-findings/crashes/id\:000003\,sig\:11\,src\:000001\,op\:flip4\,pos\:10 /dev/null
tgunzip - example from the tiny inflate library (www.ibsensoftware.com)

Segmentation fault

Some test cases are problems that promptly result in segfaults in a normal build of the software, others are only overt failures when compiling and linking uzlib and tgunzip with clang -fsanitize=undefined -fno-sanitize-recover=undefined and, when running the fuzzer, ensuring that UBSAN_OPTIONS=abort_on_error=1 is set in the environment for them to be properly recorded as crashes. Howerver, as "undefined behavior" includes things like overwriting unrelated values in memory (think return addresses on the stack) I believe both categories deserve attention.

Typical crash that requires the undefined behavior sanitizer to be used at compile time:

examples/tgunzip/tgunzip tgunzip-findings7/crashes/id\:000001\,sig\:06\,src\:000019\,op\:havoc\,rep\:4  /dev/null
tgunzip - example from the tiny inflate library (www.ibsensoftware.com)

tinflate.c:358:39: runtime error: index 30 out of bounds for type 'const unsigned char [30]'

My interest is that uzlib should be safe for IoT devices to use on untrusted input; getting rid of all the outright crashes that can be found by automated methods is an important step towards that. I have fixed many of these crashes locally and you can expect pull requests to be forthcoming.

pfalcon commented 6 years ago

@jepler: Finally trying to look at this in more detail. The uzlib-crashes.zip contains following top-level dirs:

drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 explore-findings
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings10
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings2
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings3
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings4
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings5
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings6
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings7
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings8
drwxrwxr-x 3 pfalcon pfalcon 4096 Jul  6 21:50 tgunzip-findings9

Can you please explain what each of them signifies, and how they differ among themselves?

jepler commented 6 years ago

Each run of afl-fuzz produces a different "findings" directory; I just numbered them. "explore-findings" resulted from running afl-fuzz in a different mode which was constrained to try variations of a specific original finding (i.e., exploration of a specific fault). "findings10" is much newer than "findings1", but other than that it doesn't mean anything.

jepler commented 6 years ago

I did a short fuzzing run on your branch afl-crash-fixes-pfalcon pfalcon/uzlib@78fe45bcec887d2a3ca0495b485efc14d96a0153 and unfortunately it readily finds fresh crashes. Here is a zip of those findings.

findings-afl-crash-fixes-pfalcon.zip

pfalcon commented 6 years ago

I did a short fuzzing run on your branch afl-crash-fixes-pfalcon 78fe45b and unfortunately it readily finds fresh crashes.

Thanks, that's how I'd expect it to be - that you'd run another session with AFL trying to break it ;-).

But let's get on the same line first. In 42bd6988ebdf6e3f36af94ec41832e02a5f7f83b, I pushed contents of the original uzlib-crashes.zip and a runner script for it. With that state of the branch, everything passes for me, including when built with make CC="clang -fsanitize=undefined -fno-sanitize-recover=undefined". Can you please confirm it's the same for you?

I then look into the new batch (from a quick look at the first failure, it's "something new").

jepler commented 6 years ago

That's right, the current tip of your branch passes all the original testcases for me as well. Let's close this up when you merge afl-crash-fixes-pfalcon and move on with a fresh issue for new testcases found via fuzzing.

pfalcon commented 6 years ago

It took 2 fixes to get findings-afl-crash-fixes-pfalcon.zip right: update boundary check for length symbols (there're 29 of those, as 30th is end-of-block marker), and to fix tgunzip to not overrun its buffer on it side. Branch rebased.

I'll push addition of findings-afl-crash-fixes-pfalcon.zip to the repo later.

Feel free to give a longer run of AFL against the branch ;-).

jepler commented 6 years ago

Is it possible that you still need to add a file in your fuzzer fixes branch? I get this:

diff: afl-corpus.ref: No such file or directory
Test FAILED

pfalcon commented 6 years ago

Yup, sorry, squashed/pushed.

pfalcon commented 6 years ago

I'll push addition of findings-afl-crash-fixes-pfalcon.zip to the repo later.

Now in the branch. Looking forward for feedback (at your convenience of course).

jepler commented 6 years ago

I tested overnight (9 hours) at 403cc9b "tgunzip: Don't allow to decompress more data than specified in gzip trailer." and the fuzzer has found 0 crashes and 0 hangs so to me everything looks good from the standpoint of stability.

There's a problem with run-afl-corpus.sh due to locales(!) and I'll be sending a small PR related to this in a few moments.

pfalcon commented 6 years ago

There's a problem with run-afl-corpus.sh due to locales(!) and I'll be sending a small PR related to this in a few moments.

Besides squashing this (https://github.com/pfalcon/uzlib/pull/12), I also renamed findings-afl-crash-fixes-pfalcon/ dir to tgunzip-findings11/, and squashed it into the main corpus-introducing commit.

pfalcon commented 6 years ago

@jepler : Can you please eyeball once again https://github.com/pfalcon/uzlib/commit/93def8d64626c97c1faa3d3bf6adb55e3e94ce20 , https://github.com/pfalcon/uzlib/commit/93def8d64626c97c1faa3d3bf6adb55e3e94ce20 for any possible simplifications/improvements? For example, if you know that some testcases are duplicate, can we drop them? Should we drop READMEs? For afl-corpus.ref, maybe we should dump exit code first, followed by filename (makes it easier for humans)?

Thanks.

jepler commented 6 years ago

One methodology would be to select 1 testcase that is fixed at each ref. (or if not 1, then some other small number like 4).

The READMEs and the directory structure are not worth preserving. You could just name them after their sha1sums or something if you don't want to take on the task of giving them human readable names but want them to all be distinct names.

pfalcon commented 6 years ago

One methodology would be to select 1 testcase that is fixed at each ref. (or if not 1, then some other small number like 4).

At each git rev do you mean? Yes, good idea, but who would do that? ;-) I'm definitely not up to that, and wouldn't suggest anyone else to spend more than 15-20 mins on that ;-). (However, it may be worth doing that for the latest dir added. Or maybe not, because commits are merged in random order).

Btw, I wanted to ask if AFL allows to specify something like fixed trailer. Because https://github.com/pfalcon/uzlib/commit/35e9c235da600cb562f71b73973eadd5ff45a04f which is now in the mainline, processing of many streams will be stopped early. For example, the latest dir, (tgunzip-findings11) has two 20+K files, but they have trailing zeros, where gzip size is. So, as soon as one byte will be decompressed out of them, the processing will stop (with error, as they said they contain 0 uncompressed bytes, but actually had at least one).

pfalcon commented 6 years ago

Or tgunzip-findings/ - it contains both id:* and min:id:*. Which one of those subsets we should drop?

pfalcon commented 6 years ago

You could just name them after their sha1sums or something if you don't want to take on the task of giving them human readable names but want them to all be distinct names.

Well, I just care to not add dead weight to the repository. It's not worth bothering to rename files, especially that the original AFL names provide some info on how a file was produced.

pfalcon commented 6 years ago

So, my initial plan would be:

Drop READMEs.
Drop empty dirs (missed they're there) and renumber dirs, and use just number for the names.

pfalcon commented 6 years ago

Drop READMEs. Drop empty dirs (missed they're there) and renumber dirs, and use just number for the names.

Now done in my branch. I also figured that naming it specific to AFL doesn't make sense as further it can be extended from other sources, so renamed generically decomp-bad-inputs/.

pfalcon commented 6 years ago

There were some duplicates too. Dropped them.

pfalcon commented 6 years ago

Dropped large files from the latest dir, as they don't seem to trigger any bitstream issues, only old out buffer alloc issues in tgunzip.

pfalcon commented 6 years ago

So, I'm probably satisfied with the shape of the afl corpus, and would be ready to merge it to master soon. @jepler, can you please have a look at the latest https://github.com/pfalcon/uzlib/commits/afl-crash-fixes-pfalcon for any last-minute issues?

pfalcon commented 6 years ago

Ok, I don't want to lose momentum and risk another couple of months delay with this. So, merged into master. If any issues are found, they can be fixed later (and I'm not too shy to rebase master to fix goofs too ;-) ).

jepler commented 5 years ago

v2.9 has survived 24 hours of fuzzing with 0 crashes and 0 hangs. Around 364 million test cases were tried.

pfalcon / uzlib

various testcases that crash tgunzip #9