Open comicfans opened 5 years ago
This malloc is likely the lzma decompression dictionary, whose size is stored in a uint32, so 4294967296 is worst case but still within the spec. However, as the test case has an invalid lzma header it should not allocate the dictionary buffer at all. I'm not sure if this is a missing sanity check in unarr or a bug in the included lzma sdk, but lzma decompression provided by xz correctly rejects the lzma stream earlier.
I'm not familiar with zip spec, doesn't zip file format provide some level of integrity check , so we can reject such invalid input earlier ?
Zip is the container, lzma is one of the possible compression algorithms. The problem is not the zip format but the lzma header. The xz library correctly checks this header, the lzma sdk code (which is a different library) does not. I can implement the check myself (it is trivial), but I prefer a solution that fixes the cause of the problem instead of treating the symptoms.
so that means the lzma header isn't under zip crc protecting, so it can't be verified as part of zip container, only can be verified as lzma header alone ?
I think so, yes. XZ utils has a good description of the lzma format. There are also subtile differences between the implementations in the xz utils and the lzma sdk, so some very rare files will be valid for the sdk but invalid for the xz utils, but the chance of enountering these in the wild is very slim.
Also, consider this. You are fuzzing unarr using an instrumented fuzzer. This will create test cases which pass crc and other checks but still have invalid header data.
during fuzzer test, I've found unarr may try to allocate very big memory (malloc(4294967296)) for some invalid input, the code path goes as
an example input as follows (base64 coded):