Libdisallocator bad allocator canary fault when running on a physical Android device.

D0nYu commented 3 years ago

I am trying to run the harness on a physical Android device with LIBC_HOOKS_ENABLE=1, but got an error of AFL libdisallocator:

[!] Running on Android, heap chunks will be automatically 8-byte aligned.
*** [AFL] bad allocator canary on realloc() ***
ASAN:SIGABRT
==15909==ERROR: AddressSanitizer: ABRT on unknown address 0x7d000003e25 (pc 0x70c1eb506c sp 0x7fe8c5c320 bp 0x7fe8c5c320 T0)
......

The harness can successfully execute in qemu or with the -d (unset LIBC_HOOKS_ENABLE) option. Any idea about that?

j00ru commented 3 years ago

That's interesting, especially considering that the corruption seems to take place very early in the loader (image header parsing?). I personally haven't encountered such a crash unless it was a real bug in the codec. What device/Android version/input file are you using here? Does it reproduce with all images or just a specific one?

To root cause the problem, I would start off by trying to figure out where/how the alleged corruption takes place by looking at the malloc logs provided by option -l in the loader, and/or attaching gdb to observe how exactly the code operates on the faulty heap chunk.

D0nYu commented 3 years ago

In fact, two categories of bad allocator canary on free()/realloc() happens.

C1: On a specific model of a physical device, every image(jpeg and heic) leads to a corruption of android log print. But png goes well.
C2: Several jpeg images generated during my fuzz (on qemu) can corrupt both qemu and physical devices(I tested them on Samsung, Huawei, and Google Pixel. All of them crashed).

I attached gdb to trace how the C2 happens. It seems that when the poc in given, SkDngImage::readDng is triggered to read the image and finally raise an exception of unexpected EOF of file in dng_stream:Get. The call trace:

 [main + 0x5c] --> [ProcessImage + 0x144] --> [SkCodec::MakeFromStream + 0x230] --> [SkRawCodec::MakeFromStream + 0x28c] --> [SkDngImage::NewFromStream + 0x90] --> [SkDngImage::readDng + 0xe0] --> [dng_info::Parse + 0x1b0] --> [dng_info::ParseIFD + 0xd4] --> [dng_stream::Get_uint32 + 0x2c] --> [dng_stream::Get + 0x17c]

Finally libc++.so calls cxa_end_catch(), in which cxa_eh_globals is freed, which lead to the crash. Maybe the __cxa_eh_globals is allocated before the malloc hook is setup? And I'm not sure whether the C1 is caused by a similar reason. Or maybe I should send the POC to you?

j00ru commented 3 years ago

Thanks for the analysis. I have successfully reproduced the C2 case with some corrupted DNG (raw) files, and I can confirm that the false-positives are caused by throwing a C++ exception in the dng_sdk code. Some global objects in libc++.so indeed seem to be allocated before the malloc hooks are installed in the harness, and the free/realloc hooks are confused when they see such heap chunks and abort.

I'm not sure if it's feasible to set up the hooks sufficiently early that the libc++.so initialization is also covered by them. If you have any ideas on how to achieve this, let me know. Otherwise, I can see three potential solutions:

Recognize which chunks are allocated through libdislocator by placing some kind of 4/8 byte marker before the chunk returned by malloc(), and redirect the freeing of unrecognized chunks to the original libc free().
Keep track of the list of active libdislocator allocations in some global container, and similarly decide how each address should be freed (afl_free or standard free) based on whether it's found on the list.
Declare this a known limitation with no available fix.

Options 1 and 2 are not the most elegant, and add some extra overhead and potential false-negatives in the detection of invalid frees. I'm leaning towards the last option, because SkCodecFuzzer was originally implemented as a way of fuzzing the Samsung Qmage codec, which doesn't use C++ exceptions. On the other hand, the default image decoders used by Skia are all open-source, and it should be much more effective to fuzz libjpeg, libheif and other respective libraries natively with Sanitizers etc., instead of using qemu and the Android libraries. For example, that's how I found the dng_sdk vulnerabilities described in issues #2009 and #2010.

j00ru commented 3 years ago

I dug some more into this issue and found out that the root cause was much simpler than I had originally thought - so please disregard most of the previous comment. :) The Android libc provides four allocator hooks for malloc, realloc, free and memalign. The harness didn't implement a hook for memalign, so any memory allocated through this function would be untracked and would crash when subsequently passed to realloc or free. One instance where memalign is used is indeed the C++ exception allocation code in libc++.so, and another is in the dng_sdk library itself (in the dng_malloc_block class). This should explain at least the C2 crashes that you were seeing.

I submitted commit 1ffd5b20121c0d6ae842de0da04d5e4d3df64f5c to address the bug. I'm closing the issue, but feel free to reopen if you still encounter any similar problems.

D0nYu commented 3 years ago

Thanks for the commit. The C2 crashes are solved by the memalign hook. But there do exist some global chunks allocated before the launching of the harness, which lead to the C1 crashes. In my case (C1), the target library print some logs in the .init_array, which allocates the tag of android_log_print before the harness sets up the malloc hooks. A non-general way to avoid the crash is to nop these log print functions.

googleprojectzero / SkCodecFuzzer

Libdisallocator bad allocator canary fault when running on a physical Android device. #6