theRockLiu / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
1 stars 0 forks source link

LeakSanitizer spends a lot of time in StackDepotGet() #213

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The leak checking phase in LSan takes way too long, to the point where 
ASan+LSan runs of unit_tests in Chrome take ~10% more time than HeapChecker 
runs.

From perf:

65.51%       unit_tests  unit_tests                         [.] 
__sanitizer::StackDepotGet(unsigned int, unsigned long*) 

Zoom and enhance (unit_tests DSO, main thread):

 99.58%  unit_tests  __sanitizer::StackDepotGet(unsigned int, unsigned long*)

It appears that LSan issues about 130k queries to StackDepot in 
ProcessPlatformSpecificAllocations. The stack depot was not built for that kind 
of use, and StackDepotGet is basically linear search in a linked list. 
Pre-fetching StackDepot data into some kind of structure with sublinear access 
time should improve the performance dramatically.

Original issue reported on code.google.com by earth...@google.com on 22 Aug 2013 at 12:18

GoogleCodeExporter commented 9 years ago
Also, PPSA is not supposed to be making so many queries in the first place. 
It's supposed to only check those blocks which are either leaked, ignored or 
reachable only via dynamic TLS, so 130k seems too high. If the number is real, 
we could try introducing extra ways to filter out irrelevant blocks (e.g., 
we're only interested in blocks allocated through __libc_memalign, so we could 
track them separately). Or perhaps find another way to reach the dynamic TLS 
and get rid of this entire function.

Original comment by earth...@google.com on 22 Aug 2013 at 12:27

GoogleCodeExporter commented 9 years ago
I have landed a patch (r189217) to prefetch the StackDepot data into a sorted 
array. This change alone dropped the leak checking time on unit_tests from ~3 
minutes to several seconds. The question is whether we can get rid of 
ProcessPlatformSpecificAllocations() completely.

The purpose of this function is to mark as live all allocations made from 
ld.so. We need it to handle this call to __libc_memalign:

http://bazaar.launchpad.net/~vcs-imports/glibc/master/view/head:/elf/dl-tls.c#L5
29

Here libc allocates TLS space for a loaded module. The pointer to that chunk of 
memory is stored in the DTV (array of TLS pointers), which in turn is pointed 
to by the thread descriptor. We know where the thread descriptor is located, so 
we could theoretically reach the dynamically allocated TLS chunk through the 
DTV. Unfortunately, the initial DTV for the main thread is allocated very 
early, at a time when LSan (or libc, even) has not been loaded yet; ld.so 
actually uses an internal malloc implementation for that. So the DTV 
effectively becomes invisible to LSan because it wasn't allocated through our 
allocator. We can't special-case it either, because we don't know the size of 
the DTV (it depends on the number of modules present at startup). We also can't 
put an annotation around those __libc_memalign calls, because TLS space is 
allocated lazily on first access (rather than during dlopen(), for instance).

Kostya suggests making __libc_memalign interceptor check the caller's PC 
against the ld.so range obtained from /proc/self/maps, rather than deferring 
this check until the  leak checking phase. Theoretically this should not 
introduce a huge slowdown because __libc_memalign is rarely used (and because 
comparison is cheap, anyway - and we only have to obtain the range once). 
However, this could still mean that we're no longer able to claim "zero 
slowdown" compared to plain ASan. 

Original comment by earth...@google.com on 26 Aug 2013 at 2:08

GoogleCodeExporter commented 9 years ago
Interestingly, if I disable ProcessPlatformSpecificAllocations(), this calloc 
also starts leaking:

http://bazaar.launchpad.net/~vcs-imports/glibc/master/view/head:/elf/dl-tls.c#L2
96

called from allocate_stack:

http://bazaar.launchpad.net/~vcs-imports/glibc/master/view/head:/nptl/allocatest
ack.c#L580

called from pthread_create. This is all kinds of weird, because we're supposed 
to be able to reach that through the thread descriptor.

It seems that there's also sometimes a malloc() leak from do_dlopen (I 
reproduced it on unit_tests with GTEST_FILTER="Ab*"). 

Original comment by earth...@google.com on 26 Aug 2013 at 2:27

GoogleCodeExporter commented 9 years ago
BTW here's the perf report after the recent change (the leak checking phase 
took 5 seconds):

 29.28%  unit_tests  unit_tests         [.] void __sanitizer::InternalSort<__sanitizer::InternalMmapVector<__sanitizer::StackDepotReverseMap::IdDescPair>, bool (*)(__sanitizer::StackDepotReverseMap::IdDesc...
 24.63%  unit_tests  unit_tests         [.] __sanitizer::StackDepotReverseMap::StackDepotReverseMap()                                                                                                        
 23.85%  unit_tests  unit_tests         [.] __sanitizer::StackDepotReverseMap::IdDescPair::IdComparator(__sanitizer::StackDepotReverseMap::IdDescPair const&, __sanitizer::StackDepotReverseMap::IdDescPair c...
 13.44%  unit_tests  unit_tests         [.] __lsan::GetUserBegin(unsigned long)                                                                                                                              
  1.20%  unit_tests  unit_tests         [.] __lsan::LsanMetadata::allocated() const

Original comment by earth...@google.com on 26 Aug 2013 at 3:52

GoogleCodeExporter commented 9 years ago
So, we are spending *all* the time traversing stack depot... 

Original comment by konstant...@gmail.com on 28 Aug 2013 at 9:50

GoogleCodeExporter commented 9 years ago
We will not need ProcessPlatformSpecificAllocations once we start using
the new __tls_get_addr interceptor, right? 

Original comment by konstant...@gmail.com on 29 Jan 2014 at 1:23

GoogleCodeExporter commented 9 years ago
Theoretically yes. In practice, however, on one occasion I think I've seen a 
leak report (with use_tls=false) coming from ld.so which was not a dynamic tls 
block. Will have to test and see.

Original comment by earth...@chromium.org on 29 Jan 2014 at 4:33