Closed D0nYu closed 3 years ago
That's interesting, especially considering that the corruption seems to take place very early in the loader (image header parsing?). I personally haven't encountered such a crash unless it was a real bug in the codec. What device/Android version/input file are you using here? Does it reproduce with all images or just a specific one?
To root cause the problem, I would start off by trying to figure out where/how the alleged corruption takes place by looking at the malloc logs provided by option -l
in the loader, and/or attaching gdb to observe how exactly the code operates on the faulty heap chunk.
In fact, two categories of bad allocator canary on free()/realloc()
happens.
C1: On a specific model of a physical device, every image(jpeg and heic) leads to a corruption of android log print. But png goes well.
C2: Several jpeg images generated during my fuzz (on qemu) can corrupt both qemu and physical devices(I tested them on Samsung, Huawei, and Google Pixel. All of them crashed).
I attached gdb to trace how the C2 happens. It seems that when the poc in given, SkDngImage::readDng
is triggered to read the image and finally raise an exception of unexpected EOF of file in dng_stream:Get. The call trace:
[main + 0x5c] --> [ProcessImage + 0x144] --> [SkCodec::MakeFromStream + 0x230] --> [SkRawCodec::MakeFromStream + 0x28c] --> [SkDngImage::NewFromStream + 0x90] --> [SkDngImage::readDng + 0xe0] --> [dng_info::Parse + 0x1b0] --> [dng_info::ParseIFD + 0xd4] --> [dng_stream::Get_uint32 + 0x2c] --> [dng_stream::Get + 0x17c]
Finally libc++.so calls cxa_end_catch(), in which cxa_eh_globals is freed, which lead to the crash. Maybe the __cxa_eh_globals is allocated before the malloc hook is setup? And I'm not sure whether the C1 is caused by a similar reason. Or maybe I should send the POC to you?
Thanks for the analysis. I have successfully reproduced the C2 case with some corrupted DNG (raw) files, and I can confirm that the false-positives are caused by throwing a C++ exception in the dng_sdk code. Some global objects in libc++.so
indeed seem to be allocated before the malloc hooks are installed in the harness, and the free/realloc hooks are confused when they see such heap chunks and abort.
I'm not sure if it's feasible to set up the hooks sufficiently early that the libc++.so initialization is also covered by them. If you have any ideas on how to achieve this, let me know. Otherwise, I can see three potential solutions:
malloc()
, and redirect the freeing of unrecognized chunks to the original libc free()
.afl_free
or standard free
) based on whether it's found on the list.Options 1 and 2 are not the most elegant, and add some extra overhead and potential false-negatives in the detection of invalid frees. I'm leaning towards the last option, because SkCodecFuzzer was originally implemented as a way of fuzzing the Samsung Qmage codec, which doesn't use C++ exceptions. On the other hand, the default image decoders used by Skia are all open-source, and it should be much more effective to fuzz libjpeg, libheif and other respective libraries natively with Sanitizers etc., instead of using qemu and the Android libraries. For example, that's how I found the dng_sdk vulnerabilities described in issues #2009 and #2010.
I dug some more into this issue and found out that the root cause was much simpler than I had originally thought - so please disregard most of the previous comment. :) The Android libc provides four allocator hooks for malloc
, realloc
, free
and memalign
. The harness didn't implement a hook for memalign
, so any memory allocated through this function would be untracked and would crash when subsequently passed to realloc
or free
. One instance where memalign
is used is indeed the C++ exception allocation code in libc++.so
, and another is in the dng_sdk library itself (in the dng_malloc_block
class). This should explain at least the C2 crashes that you were seeing.
I submitted commit 1ffd5b20121c0d6ae842de0da04d5e4d3df64f5c to address the bug. I'm closing the issue, but feel free to reopen if you still encounter any similar problems.
Thanks for the commit. The C2 crashes are solved by the memalign
hook. But there do exist some global chunks allocated before the launching of the harness, which lead to the C1 crashes. In my case (C1), the target library print some logs in the .init_array
, which allocates the tag of android_log_print
before the harness sets up the malloc hooks. A non-general way to avoid the crash is to nop these log print functions.
I am trying to run the harness on a physical Android device with LIBC_HOOKS_ENABLE=1, but got an error of AFL libdisallocator:
The harness can successfully execute in qemu or with the -d (unset LIBC_HOOKS_ENABLE) option. Any idea about that?