allocate very big memory for some invalid input

comicfans commented 5 years ago

during fuzzer test, I've found unarr may try to allocate very big memory (malloc(4294967296)) for some invalid input, the code path goes as

#7 0x55f8e1e98949 in malloc (/home/wangxinyu/unarr/fuz/test/fuzzer+0x132949)
#8 0x7fe3a3744dc0 in gLzma_Alloc /home/wangxinyu/unarr/fuz/../zip/uncompress-zip.c:284:78
#9 0x7fe3a375155c in LzmaDec_Allocate /home/wangxinyu/unarr/fuz/../lzmasdk/LzmaDec.c:1150:22
#10 0x7fe3a3743334 in zip_uncompress_data_lzma /home/wangxinyu/unarr/fuz/../zip/uncompress-zip.c:314:15
#11 0x7fe3a373fd35 in zip_uncompress_part /home/wangxinyu/unarr/fuz/../zip/uncompress-zip.c:529:17
#12 0x7fe3a3747105 in zip_uncompress /home/wangxinyu/unarr/fuz/../zip/zip.c:152:14
#13 0x7fe3a36d2779 in ar_entry_uncompress /home/wangxinyu/unarr/fuz/../common/unarr.c:85:12

an example input as follows (base64 coded):

AAgAAAAAAAAACv0A5PxQSwMEHBy8BA4AADpSEABSEChQAAAAAAADBBwAAAAAHAo/SwMAHFIEA7wJ
ABAoAAAAAAAcAChSTApQSwAqAHEAAPj/////r7T/CgAATApQSwUGAAIA/zJDBl07ACEAAAAAAAAA
BLwJHgADAANSEAADAAAAAAAAABJAAAAK/eQA/FBLAwQcHLwEDgAAAFIAAABS

selmf commented 5 years ago

This malloc is likely the lzma decompression dictionary, whose size is stored in a uint32, so 4294967296 is worst case but still within the spec. However, as the test case has an invalid lzma header it should not allocate the dictionary buffer at all. I'm not sure if this is a missing sanity check in unarr or a bug in the included lzma sdk, but lzma decompression provided by xz correctly rejects the lzma stream earlier.

comicfans commented 5 years ago

I'm not familiar with zip spec, doesn't zip file format provide some level of integrity check , so we can reject such invalid input earlier ?

selmf commented 5 years ago

Zip is the container, lzma is one of the possible compression algorithms. The problem is not the zip format but the lzma header. The xz library correctly checks this header, the lzma sdk code (which is a different library) does not. I can implement the check myself (it is trivial), but I prefer a solution that fixes the cause of the problem instead of treating the symptoms.

comicfans commented 5 years ago

so that means the lzma header isn't under zip crc protecting, so it can't be verified as part of zip container, only can be verified as lzma header alone ?

selmf commented 5 years ago

I think so, yes. XZ utils has a good description of the lzma format. There are also subtile differences between the implementations in the xz utils and the lzma sdk, so some very rare files will be valid for the sdk but invalid for the xz utils, but the chance of enountering these in the wild is very slim.

Also, consider this. You are fuzzing unarr using an instrumented fuzzer. This will create test cases which pass crc and other checks but still have invalid header data.

selmf / unarr

allocate very big memory for some invalid input #8