google / syzygy

Syzygy Transformation Toolchain
Apache License 2.0
355 stars 59 forks source link

ClusterFuzz tripping DCHECKs in SyzyASAN RTL #51

Open sigurasg opened 8 years ago

sigurasg commented 8 years ago

Investigating http://crbug.com/627455 I built syzyasan_rtl.dll in debug and dropped it in. It's not easy to debug what's going on, but by running chrome with --no-sandbox, and dropping a MessageBox into
LONG WINAPI AsanRuntime::UnhandledExceptionFilter(struct _EXCEPTION_POINTERS* exception) {

I can unwind the stack to see what's tripping.

Long story short, I see multiple threads tripping over this DCHECK BlockHeapInterface* BlockHeapManager::GetHeapFromId(HeapId heap_id) {

DCHECK_NE(reinterpret_cast(nullptr), heap_id); HeapQuarantinePair* hq = reinterpret_cast<HeapQuarantinePair>(heap_id); DCHECK_NE(static_cast<BlockHeapInterface>(nullptr), hq->first); return hq->first; }

One level up I have the block_info the heap_id comes from: Local var @ 0x65ada88 Type agent::asan::BlockInfo*

0x065adf14 +0x000 block_size : 0x288bfe70 +0x004 header : 0x7fff0020 agent::asan::BlockHeader +0x000 magic : 0y1100101010000000 (0xca80) +0x000 checksum : 0y0010110101110 (0x5ae) +0x000 is_nested : 0y0 +0x000 has_header_padding : 0y0 +0x000 has_excess_trailer_padding : 0y0 +0x004 state : 0y00 +0x004 body_size : 0y101000100010111111111001001100 (0x288bfe4c) +0x008 alloc_stack : 0x09b1a24c agent::common::StackCapture +0x000 __VFN_table : 0x00fe655c =00fe34f4 agent::common::StackCapture::kMaxNumFrames : 0x3e =00ffdcf4 agent::common::StackCapture::kMaxRefCount : 0xffff =0106bc88 agent::common::StackCapture::bottom_frames_toskip : 0 +0x004 absolute_stackid : 0x206af057 +0x008 relative_stackid : 0xba916fac +0x00c numframes : 0x23 '#' +0x00d max_numframes : 0x23 '#' +0x00e refcount : 1 +0x010 frames_ : [62] 0x00c1ae49 Void +0x00c free_stack : (null) +0x008 header_padding : 0x7fff0030 agent::asan::BlockHeaderPadding +0x00c header_padding_size : 0 +0x010 body : 0x7fff0030 agent::asan::BlockBody +0x014 body_size : 0x288bfe4c +0x018 trailer_padding : 0xa88afe7c agent::asan::BlockTrailerPadding +0x01c trailer_padding_size : 0 +0x020 trailer : 0xa88afe7c agent::asan::BlockTrailer +0x000 alloc_tid : 0 +0x004 free_tid : 0 +0x008 alloc_ticks : 0 +0x00c free_ticks : 0 +0x010 heap_id : 0 +0x024 block_pages : 0x7fff1000 "" +0x028 block_pages_size : 0x288be000 +0x02c left_redzone_pages : (null) +0x030 left_redzone_pages_size : 0 +0x034 right_redzone_pages : (null) +0x038 right_redzone_pages_size : 0 +0x03c is_nested : 0

Looks like the trailer is either not initialized, or has been overwritten.

-- example stack -- 0:007> kv *\ Stack trace for last set context - .thread/.cxr resets it ChildEBP RetAddr Args to Child
065ad1b4 00c9bbd3 065ad8dc 065adf6c 00000001 syzyasan_rtl!base::debug::BreakDebugger+0x16 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\src\syzygy\src\base\debug\debugger_win.cc @ 21] 065ad73c 00c20a3f 065ada80 cccccccc cccccccc syzyasan_rtl!logging::LogMessage::~LogMessage+0x2c3 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\src\syzygy\src\base\logging.cc @ 742] 065ad8dc 00c1fa21 00000000 065adc1c 065adf6c syzyasan_rtl!agent::asan::heap_managers::BlockHeapManager::GetHeapFromId+0xaf (FPO: [Non-Fpo]) (CONV: cdecl) [c:\src\syzygy\src\syzygy\agent\asan\heap_managers\block_heap_manager.cc @ 556] 065ada80 00c1f209 065adf14 065adf5c 00000000 syzyasan_rtl!agent::asan::heap_managers::BlockHeapManager::FreePristineBlock+0x181 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\src\syzygy\src\syzygy\agent\asan\heap_managers\block_heap_manager.cc @ 802] 065adc1c 00c1ead7 065adf14 065ae03c 065adf6c syzyasan_rtl!agent::asan::heap_managers::BlockHeapManager::FreeCorruptBlock+0x189 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\src\syzygy\src\syzygy\agent\asan\heap_managers\block_heap_manager.cc @ 797] 065adf5c 00c37a33 007b8938 7fff0030 065ae118 syzyasan_rtl!agent::asan::heap_managers::BlockHeapManager::Free+0x267 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\src\syzygy\src\syzygy\agent\asan\heap_managers\block_heap_manager.cc @ 274] 065ae03c 00c41458 007b8938 00000000 7fff0030 syzyasan_rtl!agent::asan::WindowsHeapAdapter::HeapFree+0xd3 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\src\syzygy\src\syzygy\agent\asan\windows_heap_adapter.cc @ 105] 065ae118 11bf9c40 007b8938 00000000 7fff0030 syzyasan_rtl!asan_HeapFree+0x108 (FPO: [Non-Fpo]) (CONV: stdcall) [c:\src\syzygy\src\syzygy\agent\asan\rtl_impl.cc @ 124] 065ae12c 0fcea1f4 7fff0030 065ae14c 0fcec6b7 chrome_child!_free_base+0x1c (FPO: [Non-Fpo]) (CONV: cdecl) [d:\th\minkernel\crts\ucrt\src\appcrt\heap\free_base.cpp @ 107] 065ae138 0fcec6b7 7fff0030 00000000 077748cc chrome_child!sk_free_releaseproc+0xb (FPO: [Non-Fpo]) (CONV: cdecl) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skdata.cpp @ 95] 065ae14c 1127540f 00000001 0fcc4291 0778fac0 chrome_child!SkMallocPixelRef::scalar deleting destructor'+0x36 (FPO: [Non-Fpo]) (CONV: thiscall) 065ae154 0fcc4291 0778fac0 077748b0 0fcd0e54 chrome_child!ui::AXNode::Destroy+0xa (FPO: [0,0,0]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\ui\accessibility\ax_node.cc @ 35] 065ae160 0fcd0e54 0778dd18 065ae180 1127540f chrome_child!SkBitmap::~SkBitmap+0x2d (FPO: [0,0,0]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skbitmap.cpp @ 46] 065ae16c 1127540f 00000001 0fcd0e05 0778dd18 chrome_child!SkNoPixelsBitmapDevice::scalar deleting destructor'+0xe (FPO: [Non-Fpo]) (CONV: thiscall) 065ae174 0fcd0e05 0778dd18 065ae1a4 0fcd5dfa chrome_child!ui::AXNode::Destroy+0xa (FPO: [0,0,0]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\ui\accessibility\ax_node.cc @ 35] 065ae180 0fcd5dfa 00000001 0000000a 0778fac0 chrome_child!DeviceCM::scalar deleting destructor'+0x29 (FPO: [Non-Fpo]) (CONV: thiscall) 065ae190 0fce07fe 076336e8 0fe626df 00000000 chrome_child!SkCanvas::internalRestore+0x92 (FPO: [0,0,0]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skcanvas.cpp @ 1348] 065ae198 0fe626df 00000000 065ae220 0fe634be chrome_child!SkCanvas::restore+0x2f (FPO: [0,0,4]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skcanvas.cpp @ 1040] 065ae1a4 0fe634be 065ae1e4 0778fac0 075dd2a0 chrome_child!SkRecord::Record::visit<SkRecords::Draw &>+0x2d (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skrecord.h @ 170] 065ae220 0fe5d8a3 0769b540 0778fac0 00000000 chrome_child!SkRecordDraw+0xeb (FPO: [Non-Fpo]) (CONV: cdecl) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skrecorddraw.cpp @ 36] 065ae274 0fcdc605 0778fac0 00000000 075dd2a0 chrome_child!SkBigPicture::playback+0xb5 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skbigpicture.cpp @ 44] 065ae2ac 0fcd3471 075dd2a0 00000000 00000000 chrome_child!SkCanvas::onDrawPicture+0xcb (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skcanvas.cpp @ 2972] 065ae308 11e23c0f 075dd2a0 00000000 00000000 chrome_child!SkCanvas::drawPicture+0x107 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\third_party\skia\src\core\skcanvas.cpp @ 2944] 065ae328 11f09f13 0778fac0 00000000 065ae35c chrome_child!cc::DisplayItemList::Raster+0xe0 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\playback\display_item_list.cc @ 144] 065ae398 11f098a3 0778fac0 00000000 0746a054 chrome_child!cc::RasterSource::RasterCommon+0x10f (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\playback\raster_source.cc @ 204] 065af500 11f24e4b 0778fac0 0746a054 065af61c chrome_child!cc::RasterSource::PlaybackToCanvas+0x17c (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\playback\raster_source.cc @ 98] 065af5e4 11ee0ba0 05900000 00000002 0772f850 chrome_child!cc::RasterBufferProvider::PlaybackToMemory+0x159 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\raster\raster_buffer_provider.cc @ 84] 065af634 11ee0950 0772f850 0760339c 07694070 chrome_child!cc::OneCopyRasterBufferProvider::PlaybackToStagingBuffer+0xf2 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\raster\one_copy_raster_buffer_provider.cc @ 228] 065af678 11ee08c6 0772f850 07780ca0 07780d60 chrome_child!cc::OneCopyRasterBufferProvider::PlaybackAndCopyOnWorkerThread+0x54 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\raster\one_copy_raster_buffer_provider.cc @ 176] 065af6dc 11eb41d0 07694070 0746a054 0746a064 chrome_child!cc::OneCopyRasterBufferProvider::RasterBufferImpl::Playback+0xd7 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\raster\one_copy_raster_buffer_provider.cc @ 62] 065af738 1145c3f4 03fc3918 05b7555c 05b7555a chrome_child!cc::anonymous namespace'::RasterTaskImpl::RunOnWorkerThread+0xd6 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\cc\tiles\tile_manager.cc @ 94] 065af788 1145c299 00000001 05b91844 05b91828 chrome_child!content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired+0xc4 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\content\renderer\categorized_worker_pool.cc @ 363] 065af7a8 1145c2d2 05b91878 03fc3980 0fb37a1e chrome_child!content::CategorizedWorkerPool::Run+0x16e (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\content\renderer\categorized_worker_pool.cc @ 232] 065af7b4 0fb37a1e 05b79f38 76691430 30323436 chrome_child!content::anonymous namespace'::CategorizedWorkerPoolThread::Run+0xf (FPO: [0,0,0]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\content\renderer\categorized_worker_pool.cc @ 35] 065af7e0 0fb2a30b 00000000 00000000 05b79f38 chrome_child!base::SimpleThread::ThreadMain+0x72 (FPO: [Non-Fpo]) (CONV: thiscall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\base\threading\simple_thread.cc @ 76] 065af7fc 7669338a 000002c8 065af848 77349902 chrome_child!base::anonymous namespace'::ThreadFunc+0x82 (FPO: [Non-Fpo]) (CONV: stdcall) [c:\b\build\slave\win_syzyasan_lkgr\build\src\base\threading\platform_thread_win.cc @ 86] 065af808 77349902 05b79f38 eb0104d5 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo]) 065af848 773498d5 0fb2a261 05b79f38 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo]) 065af860 00000000 0fb2a261 05b79f38 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

sigurasg commented 8 years ago

As we discussed, a part of the problem is that SyzyASAN uses the CRT directly rather than one of the shims. This means it doesn't supply malloc_unchecked, which in turn means that Skia triggers OOM crashes, where it should be handling the situation and falling back (somehow). I don't see any reason why SyzyASAN couldn't use the generic heap shim https://cs.chromium.org/chromium/src/base/allocator/allocator_shim_default_dispatch_to_winheap.cc?dr=CSs. Because we turf HeapWalk (https://github.com/google/syzygy/blob/dca3e808834c999234c9b70f8679ce99626684e0/syzygy/agent/asan/windows_heap_adapter.cc#L139), likely we break mem-infra in some respects. This does seem like a lesser evil than tripping spurious OOM, though?

chhamilton commented 8 years ago

Okay, thanks for the detailed info here. Seems pretty obvious whats going on:

chhamilton commented 8 years ago

Yeah, I think we should either support the heap shim or ensure that a definition of malloc_unchecked is provided in SyzyASAN builds (both are relatively easy to do). Heap walking is broken no matter which way we go, so I'm ambivalent.

On Thu, 29 Sep 2016 at 10:48 Sigurður Ásgeirsson notifications@github.com wrote:

As we discussed, a part of the problem is that SyzyASAN uses the CRT directly rather than one of the shims. This means it doesn't supply malloc_unchecked, which in turn means that Skia triggers OOM crashes, where it should be handling the situation and falling back (somehow). I don't see any reason why SyzyASAN couldn't use the generic heap shim https://cs.chromium.org/chromium/src/base/allocator/allocator_shim_default_dispatch_to_winheap.cc?dr=CSs. Because we turf HeapWalk ( https://github.com/google/syzygy/blob/dca3e808834c999234c9b70f8679ce99626684e0/syzygy/agent/asan/windows_heap_adapter.cc#L139), likely we break mem-infra in some respects. This does seem like a lesser evil than tripping spurious OOM, though?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/google/syzygy/issues/51#issuecomment-250488361, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6MCAy64khOd3-9mPP4U2cb15FDXQW0ks5qu8-jgaJpZM4KDLtA .