Fix various crashes found by afl-fuzz

jepler commented 6 years ago

This set of changes fixes all the crashes afl-fuzz has found for me.

However, without a testsuite of inputs, it's tough to say whether it introduces any regressions. I only tested on a single file, the GPL-3 license compressed by gnu gzip -9.

80d2ed30aa2c2f34cebc4cb05eac1095  GPL-3.gz
d32239bcb673463ab874e80d47fae504  GPL-3

and checked that after each commit, the decompression by tgunzip was the same as the original.

I particularly expect you will not be excited about the part that uses setjmp/longjmp. Please let me know how you'd like me to rework that portion. I tried making that code always return a fixed value (e.g., 0x00 or 0xff) but this led to decompressions going on "forever", where afl-fuzz defines "forever" as "1000ms".

jepler commented 6 years ago

the branch afl-crash-fixes has endured 12 hours of testing by afl-fuzz with zero hangs and zero crashes.

pfalcon commented 6 years ago

@jepler : Great work, thanks! One thing I can say is that the commit messages don't follow the convention of the codebase, you can compare: https://github.com/pfalcon/uzlib/commits/master . I.e. they should include some location information - module, possibly function name. Because otherwise a message like "don't traverse past end of t->table" doesn't tell much. If you could help with massaging the messaging into that form, that would be appreciated.

Another question - did you try to compare the code size before and after these fixes?

jepler commented 6 years ago

OK, I'll rework the commit messages in a different style.

Compiled with gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) using -Os optimization, before my changes:

   2668       0       0    2668     a6c tinflate.o (ex ../lib/libtinf.a)
    384       0       0     384     180 tinfgzip.o (ex ../lib/libtinf.a)
    159       0       0     159      9f tinfzlib.o (ex ../lib/libtinf.a)
    414       0       0     414     19e adler32.o (ex ../lib/libtinf.a)
    164       0       0     164      a4 crc32.o (ex ../lib/libtinf.a)
   2032       0       0    2032     7f0 defl_static.o (ex ../lib/libtinf.a)
    393       0       0     393     189 genlz77.o (ex ../lib/libtinf.a)
   6214       0       0    6214    1846 (TOTALS)

After my changes:

   text    data     bss     dec     hex filename
   3246       0       0    3246     cae tinflate.o (ex ../lib/libtinf.a)
    436       0       0     436     1b4 tinfgzip.o (ex ../lib/libtinf.a)
    159       0       0     159      9f tinfzlib.o (ex ../lib/libtinf.a)
    414       0       0     414     19e adler32.o (ex ../lib/libtinf.a)
    164       0       0     164      a4 crc32.o (ex ../lib/libtinf.a)
   2032       0       0    2032     7f0 defl_static.o (ex ../lib/libtinf.a)
    393       0       0     393     189 genlz77.o (ex ../lib/libtinf.a)
   6844       0       0    6844    1abc (TOTALS)

so it's +10%-ish on text segment size.

I figured you might have some specific ideas about how to rework the changes so that they had less inpact on code size, but I understand if you don't have the time to hold my hand on that.

jepler commented 6 years ago

.. I think I've eliminated the need for setjmp/longjmp. But I didn't push it to this PR yet and more testing is needed on my end:

Before:

   2668       0       0    2668     a6c tinflate.o (ex ../lib/libtinf.a)
    384       0       0     384     180 tinfgzip.o (ex ../lib/libtinf.a)

After:

   2906       0       0    2906     b5a tinflate.o
    384       0       0     384     180 tinfgzip.o

Basically, instead of longjmp, I set a new d->eof flag in uzlib_get_byte. Then I consult d->eof at a number of locations where I can return TINF_DATA_ERROR. It's enough to survive 3 minutes of fuzzing without a hang (run over 1000ms) so far.

pfalcon commented 6 years ago

@jepler, thanks for the detailed replies, and sorry for the delays with mine. I'm busy with real-world stuff, so apologize in advance if there're further delays.

Adding setjmp/longjmp is definitely something to avoid - the idea is that this lib is suitable for any constrained/deeply embedded environments, so should avoid depending on C library (say, beyond string.h). And setjmp is pretty heavy a machinery. So, thanks for preparing a version without it, I'd say it can be force-pushed here.

pfalcon commented 6 years ago

so it's +10%-ish on text segment size.

And 600+ bytes in absolute terms. I'd say that's pretty noticeable impact. Sure, I'd like this lib to be useful for real projects, but its aim also always has been to be truly minimal...

I figured you might have some specific ideas about how to rework the changes so that they had less inpact on code size, but I understand if you don't have the time to hold my hand on that.

So far, I'd have only the generic idea - I'd like to scrutinize your changes one by one, likely applying some before the others, and asking you to rebase, while I get better understanding of the underlying issues they solve and overall impact. Hope that works for you.

jepler commented 6 years ago

As you suggest, I've pushed the no-setjmp/longjmp version to this PR and closed up #11.

jepler commented 6 years ago

Here are some more notes about the size change at each ref, which might help you decide where to concentrate your review, or which parts you'd like to suggest as their own separate PRs..

total text code size of libtinf.a + tgunzip.o compiled with arm-none-eabi-gcc version 5.4.1 20160919 (15:5.4.1+svn241155-1) using compiler flags -Os -mcpu=cortex-m3 -mthumb with notes about size of TINF_DATA

2717650 tinflate: When using sliding dictionary, check that offset lies within it. 4552

0fd97e2 tinf_get_le_uint32: Don't invoke undefined behavior left shifting 4552 (+0)

84964fc tinf.h: remove unused 'destRemaining' field 4552 (+0) note: shrinks TINF_DATA by -4

b59b1bb tinf.h: add new fields for explicit end of source and destination buffers 4552 (+0) note: enlarges struct TINF_DATA by +2*sizeof(void*)

991ea7e tgunzip.c: explicitly give end of input and output buffers 4564 (+12)

51bb070 tinflate, tinfgzip: Don't read past the end of the input buffer 4620 (+56) note: I believe that 'eof' falls in what was a padding byte

0eb5744 TINF_PUT: don't write past the end of the destination buffer 4648 (+28)

f0159a0 tinf_decode_trees: Prepare tinf_decode_trees to return error value 4656 (+8)

87e7ddb tinf_decode_symbol: don't traverse past end of t->table 4672 (+16)

15ca6cf tinf_decode_symbol: don't traverse past end of t->trans 4676 (+4)

6345902 uzlib_uncompress: propagate error from tinf_decode_trees 4688 (+12)

f573bf7 tgunzip: don't crash on an impossibly small file 4722 (+34)

6de173f tinf_inflate_block_data: Don't read past the end of length_bits[] or dist_bits[] arrays 4722 (+0)

ce0721c tinf_inflate_block_data: Don't write outside of destination buffer 4726 (+4)

b170fcb tinf_decode_trees: Fix out of bound access to 'lengths' 4750 (+24)

5d7d08a tgunzip: properly initialize destStart 4754 (+4)

15608c0 tinf_inflate_block_data: Don't read outside of 'dest' buffer 4770 (+16)

Total code size increase +218 (from 4552 to 4770 bytes, +4.8%) Total TINF_DATA size change: +4 bytes, I think

With gcc version 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204] (GNU Tools for Arm Embedded Processors 7-2017-q4-major) the numbers are broadly similar, 4574 -> 4780 bytes.

pfalcon commented 6 years ago

@jepler , Thanks for all the data so far. I went thru the commit trying to find a couple with which a merge could start, and had trouble with that. Some I'd like to postpone until later (like dest buffer size checks), and with other, I faced a number of questions.

I decided to concentrate on "tinf_decode_trees: Fix out of bound access to 'lengths'", as representing a genuine issue. But the patch as is doesn't fit well with project goals of the minimal size. The changes seem to represent just mechanical addition of the most trivial checking code in each of the "last mile" problem spots. Of course, that's a way having the most impact on both code size and performance.

Whereas per project goals, a right solution would be to find code changes which would minimize both code size and performance impact. Of course, that requires some analysis and work. I don't call you to do that - many thanks already for pinpointing issues and providing a "first iteration" of solution. That's just again a "detailed report" why addressing all the issues may take some time.

My try at optimizing the commit above is at 65a7f41de32858069c703ffdad0cdfbe6bbd6f45. Of course, making such changes calls for regression testsuite, which is not there :-(. I at least quickly put up a smoke test in 74b2a51c21e3cd3d57954853b875134e514f48fd.

Please rebase, I'll look at other changes as time permits.

pfalcon commented 6 years ago

Quick size check:

before (unmodified code in master):

   text    data     bss     dec     hex filename
   2745       0       0    2745     ab9 tinflate.o

after:

   text    data     bss     dec     hex filename
   2719       0       0    2719     a9f tinflate.o

jepler commented 6 years ago

   text    data     bss     dec     hex filename

65a7f41 tinflate: tinf_decode_trees: Optimize for size and add overflow checks.

   1532       0       0    1532     5fc arm/src/tinflate.o

b71bb53 tinf_get_le_uint32: Don't invoke undefined behavior left shifting

   1532       0       0    1532     5fc arm/src/tinflate.o

c84694e tinf.h: remove unused 'destRemaining' field

   1532       0       0    1532     5fc arm/src/tinflate.o

56612da tinf.h: add new fields for explicit end of source and destination buffers

   1532       0       0    1532     5fc arm/src/tinflate.o

4a11b1b tgunzip.c: explicitly give end of input and output buffers

   1532       0       0    1532     5fc arm/src/tinflate.o

23434c7 tinflate, tinfgzip: Don't read past the end of the input buffer

   1592       0       0    1592     638 arm/src/tinflate.o

cb14a5f TINF_PUT: don't write past the end of the destination buffer

   1624       0       0    1624     658 arm/src/tinflate.o

fe4ceab tinf_decode_symbol: don't traverse past end of t->table

   1640       0       0    1640     668 arm/src/tinflate.o

61b0fcf tinf_decode_symbol: don't traverse past end of t->trans

   1644       0       0    1644     66c arm/src/tinflate.o

a3a6beb tgunzip: don't crash on an impossibly small file

   1644       0       0    1644     66c arm/src/tinflate.o

bb1bfbc tinf_inflate_block_data: Don't read past the end of length_bits[] or dist_bits[] arrays

   1644       0       0    1644     66c arm/src/tinflate.o

5373ad7 tinf_inflate_block_data: Don't write outside of destination buffer

   1648       0       0    1648     670 arm/src/tinflate.o

0aacb55 tgunzip: properly initialize destStart

   1648       0       0    1648     670 arm/src/tinflate.o

9519b1e tinf_inflate_block_data: Don't read outside of 'dest' buffer

   1664       0       0    1664     680 arm/src/tinflate.o

2f7d017 tinf_decode_trees: propagate error from tinf_decode_symbol

   1668       0       0    1668     684 arm/src/tinflate.o

77fcf19 tinf_decode_trees: avoid out of range access to lengths[]

   1672       0       0    1672     688 arm/src/tinflate.o

jepler commented 6 years ago

So now the big jumps are 23434c7 tinflate, tinfgzip: Don't read past the end of the input buffer (+60)

cb14a5f TINF_PUT: don't write past the end of the destination buffer (+32)

and the total delta is +140.

jepler commented 6 years ago

.. slightly rebased again (I had introduced a problem at "tinf_decode_trees: avoid out of range access to lengths"), and also i was relying on luck to zero-initialize d->eof to zero), but I didn't update the refs in the last summary of size changes.

Also, I added 183ac2b TINF_PUT: move out of line for slight code size savings

   text    data     bss     dec     hex filename
   1640       0       0    1640     668 arm/src/tinflate.o

to get back 32 bytes.

pfalcon commented 6 years ago

In the zip of crashing inputs ...

The zip is in #9. Thanks!

pfalcon commented 6 years ago

@jepler: Pushed a couple more fixes from your set. Had to refactor "tinf_decode_symbol: don't traverse past end of ..." fix. While I pushed it, as I agree that it represent a genuine issue, there're still questions to better under:

Are both t->table and t->trans can be overrun, or just one of them?
tinf_decode_symbol() can be called from 2 contexts: a) decoding a dictionary; b) decoding compressed data with dictionary. Can overrides happen only in one context (dictionary) or both of them?

I guess, I'll need to delve into your your crash corpus to find answers to such questions, and to integrate parts of it as regression tests.

Please rebase when you get a chance.

jepler commented 6 years ago

I will rebase this branch and look at what crashes remain. Thanks for your continuing work on these issues!

jepler commented 6 years ago

Are both t->table and t->trans can be overrun

I am glad you suggested I look into this. I reverted just the check vs TINF_ARRAY_SIZE(t->trans) at the tip of my branch. For the testcases I have, this did not reintroduce any failures. (The one on TINF_ARRAY_SIZE(t->table) is necessary)

I also found that some other change in my PR makes the length_bits[] and dest_bits[] overflow checks not needed too, for the testcases I have.

I've updated the branch again based on these new discoveries.

arm-none-eabi text size of tinflate.o at each ref:

text	delta	ref	log
1560		cc278f7	tinf_decode_symbol: Don't traverse past end of TINF_TREE->table and ->trans.
1560		71fd810	tinf.h: remove unused 'destRemaining' field
1560		a227e68	tinf.h: add new fields for explicit end of source and destination buffers
1560		6ec55ec	tgunzip.c: explicitly give end of input and output buffers
1616	+46	3ad2050	tinflate, tinfgzip: Don't read past the end of the input buffer
1648	+32	c6ac990	TINF_PUT: don't write past the end of the destination buffer
1648		89c9e3e	tgunzip: don't crash on an impossibly small file
1652	+4	368a5b8	tinf_inflate_block_data: Don't write outside of destination buffer
1652		b20268b	tgunzip: properly initialize destStart
1660	+8	f8385d3	tinf_inflate_block_data: Don't read outside of 'dest' buffer
1660		3bfe99c	tinf_decode_trees: propagate error from tinf_decode_symbol
1664	+4	5d39c96	tinf_decode_trees: avoid out of range access to lengths[]
1636	-28	5881c73	TINF_PUT: move out of line for slight code size savings
1632	-4	5b978e2	Partially revert "tinf_decode_symbol: Don't traverse past end of TINF_TREE->table and ->trans."

Net: +72 bytes

pfalcon commented 6 years ago

I am glad you suggested I look into this. I reverted just the check vs TINF_ARRAY_SIZE(t->trans) at the tip of my branch. For the testcases I have, this did not reintroduce any failures.

Thanks for confirming this.

But I guess the good plan would be to first integrate your fuzzcases from https://github.com/pfalcon/uzlib/issues/9 into the uzlib testsuite, make it pass, and then look into relaxing some of the checks added already.

pfalcon commented 6 years ago

Merged "tgunzip: don't crash on an impossibly small file" (with changes). As usual, please rebase when you get a chance.

pfalcon commented 6 years ago

Regarding https://github.com/pfalcon/uzlib/pull/10/commits/a227e681d8f9a405bd0839903406dfbe0bd741af "tinf.h: add new fields for explicit end of source and destination buffers": Why this pattern of using "pointers past end of buffer"? As you can see, there's already destSize/destRemaining, i.e. the codebase "prefers" dealing with sizes instead if end-pointers. Unless there're objectively measured arguments (e.g. that end-pointers lead to smaller code or something), I'd prefer to deal with sizes.

pfalcon commented 6 years ago

Otherwise, I ack introduction of "eof" flag. A more straightforward pattern would be to make uzlib_get_byte() be of type int and return -1 on EOF. But then it would require to test and propagate that error code in quite a bunch of places (including those which are hard, like tinf_get_le_uint32()). Indeed, using a sticky "eof" flag and checking it lazily (not all the time it can be set, but when fake data (0) being returned when it's set would lead to bad side effects) - can lead to code savings.

So, I'm going to use that flag, but will rework how it's handled in uzlib_get_byte() (which includes using length instead of end pointer, and handling eof case from d->readSource() callback too).

pfalcon commented 6 years ago

Well, I see, it allows avoid decrementing the length value. On the other hand, it disallows to have buffers which wrap around the end of address space ;-).

Somehow I feel that end-pointers are harder for a client to deal with, but not optimizing every bit would be contradiction to the principles I pledged for myself... So I guess, will need to use end-pointers indeed...

pfalcon commented 6 years ago

So, I'm going to use that flag, but will rework how it's handled in uzlib_get_byte()

Done in 50590fb5b64a13646c535ac8d89c2b3eae966155. Somehow I'm not sure I want to merge the rest of https://github.com/pfalcon/uzlib/pull/10/commits/3ad20504799ec9523d1d5176affe4bbe6f427256#diff-48b8b2c795d45e62702e5707c644f410L520. You seem to check d->eof very sparingly, but what if we can check even less than that? ;-)

pfalcon commented 6 years ago

Indeed, the only unconditionally necessary check for eof is in f57a277c0fab854b6927413b73d4ed88507b22f5, others yet need to be considered.

pfalcon commented 6 years ago

The most controversial addition here is edata. I always was suspicious of it, and now confirmed it's not really needed. uzlib decoding loop is already fully controlled at the top level by destSize, it won't decode beyond it. The reason why tgunzip specifically crashed is because it has bug dealing with destSize ;-). Specifically, it initializes it just once, so after first call, it will be zero, and then wrap around (because there's do-while loop).

pfalcon commented 6 years ago

The most controversial addition here is edata. I always was suspicious of it, and now confirmed it's not really needed. uzlib decoding loop is already fully controlled at the top level by destSize, it won't decode beyond it. The reason why tgunzip specifically crashed is because it has bug dealing with destSize ;-). Specifically, it initializes it just once, so after first call, it will be zero, and then wrap around (because there's do-while loop).

pfalcon commented 6 years ago

Ok, so next evolution of this matter is here: https://github.com/pfalcon/uzlib/tree/afl-crash-fixes-pfalcon . This branch contains my version of patches required to not crash on your AFL corpus. Branch is rebased! (That's why it's a separate branch, so I fancy rebasing master at the end of this stage too ;-) ).

And as of now, it no longer crashes on the corpus, but building with make CC="clang-3.6 -fsanitize=undefined -fno-sanitize-recover=undefined" gives follow array OOB errors:

tinflate.c:395:39: runtime error: index 30 out of bounds for type 'const unsigned char [30]'
tinflate.c:399:34: runtime error: index 31 out of bounds for type 'const unsigned char [30]'

All of them appear to be related to static Huffman trees (not dynamic ones). And the right way resolve them would be to fix them on the level of static Huffman trees, so using them can't lead to OOB access (because Huffman trees are used as lookup tables). Similarly, for dynamic case, we should validate the tree, rather than adding runtime overhead for each symbol decoded with possibly broken tree afterwards.

pfalcon commented 6 years ago

The reason why tgunzip specifically crashed is because it has bug dealing with destSize ;-).

I don't rush to fix it, because well, that's how you produced afl corpus. And if it no longer crashes even with the bug present, only cooler.

pfalcon commented 6 years ago

All of them appear to be related to static Huffman trees (not dynamic ones).

Well, not only static:

= afl-corpus/tgunzip-findings5/crashes/id:000011,sig:06,src:000155,op:havoc,rep:8
tgunzip - example from the tiny inflate library (www.ibsensoftware.com)

Started new block: type=2 final=0
hdist=32
huff sym: 11f
tinflate.c:396:39: runtime error: index 30 out of bounds for type 'const unsigned char [30]'

jepler commented 6 years ago

Thanks for another burst of activity.

All of the testcases in #9 pass with your branch afl-crash-fixes-pfalcon, so maybe we should just close this one up?

jepler commented 6 years ago

I d went ahead and rebased my branch even if it's not of interest in light of yours.

pfalcon commented 6 years ago

fix them on the level of static Huffman trees, so using them can't lead to OOB access (because Huffman trees are used as lookup tables). Similarly, for dynamic case, we should validate the tree

Well, of course I can't spend the entire weekend remembering Huffman trees. And from an hour of munging, it seems that it can be the case that some codes should be included in the tree, so it can be built properly, but should not appear in the compressed input. E.g. comments https://github.com/madler/zlib/blob/master/trees.c#L89 , https://github.com/madler/zlib/blob/master/trees.c#L295 in the reference impl.

So, I give up and ready to add validation of decoded symbol values.

jepler commented 6 years ago

My rebased branch is wrong, I'll fix it shortly.

pfalcon commented 6 years ago

Pushed in 55bd5b055a4fa9a7285a92eea625ec36f3bcd492

jepler commented 6 years ago

OK, my branch is updated again and some flaws with the push earlier today have been fixed. But it still probably makes more sense to merge your afl-crash-fixes-pfalcon branch and take it from there.

pfalcon commented 6 years ago

So, I give up and ready to add validation of decoded symbol values.

I've opened #15 to track/consider how reliable this on its own, and what's the best way to deal with this.

pfalcon commented 6 years ago

Going to slowly elaborate commits in afl-crash-fixes-pfalcon and merge into master. E.g. https://github.com/pfalcon/uzlib/commit/b4eb23c7c35e3cf830585c366e9b461e7dcb9e52 is an example of such elaborated commit.

pfalcon commented 5 years ago

Re: "Partially revert "tinf_decode_symbol: Don't traverse past end of TINF_TREE". I applied 52e264878e54fa8e285bcf6007fa21b2b9e51295 to address that for now.

pfalcon commented 5 years ago

@jepler : 2.9 with changes based on your work is released, thanks for driving that. It would be nice to finalize this ticket: see if there's anything left to the original issue of buffer overruns/undefined behavior, move parts not directly related to other tickets (e.g. "TINF_PUT: move out of line for slight code size savings" - needs consideration). Thanks in advance.

jepler commented 5 years ago

Thanks, I verified one last time that all my test cases from #9 are solved at the 2.9 tag.

pfalcon commented 5 years ago

Thanks, I verified one last time that all my test cases from #9 are solved at the 2.9 tag.

Yeah, those are now part of the in-tree testsuite ;-). Don't hesitate to run another AFL session I mean, when you get a chance ;-). Or wait a bit - I'm going to merge some optimizations, after which I really hope someone will run fuzzy testing...

pfalcon commented 5 years ago

Added comment about TINF_PUT uninlining to https://github.com/pfalcon/uzlib/issues/15#issuecomment-419799921

pfalcon / uzlib

Fix various crashes found by afl-fuzz #10