theRockLiu / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
1 stars 0 forks source link

allocator: configure the the mmap threshold at run-time #163

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The new allocator causes some of the SPEC benchmarks
to be 1.5x slower than with the old allocator. 
All extra time is spent in kernel. 
E.g. on 403.gcc:

    15.85%      gcc  [kernel.kallsyms]   [k] clear_page_c
     9.46%      gcc  libc-2.15.so        [.] __memset_sse2
     7.07%      gcc  gcc                 [.] clear_table
     5.54%      gcc  [kernel.kallsyms]   [k] __alloc_pages_nodemask
     4.08%      gcc  [kernel.kallsyms]   [k] page_fault
     2.59%      gcc  gcc                 [.] canon_rtx
     2.53%      gcc  gcc                 [.] bitmap_operation
     1.85%      gcc  [kernel.kallsyms]   [k] T.1284
     1.74%      gcc  gcc                 [.] compute_transp

investigating. 

Original issue reported on code.google.com by konstant...@gmail.com on 26 Feb 2013 at 3:14

GoogleCodeExporter commented 9 years ago
At least while running 403.gcc (cp-decl.in) I see lots of huge malloc which 
go into our LargeMmapAllocator: 

Stats: LargeMmapAllocator: allocated 180843 times, remains 29 (12596 K) max 463 
M; by size logs: 17:34278; 18:62644; 19:83564; 20:274; 21:45; 22:10; 23:19; 
24:5; 26:4; 

Original comment by konstant...@gmail.com on 26 Feb 2013 at 3:34

GoogleCodeExporter commented 9 years ago
This actually affects only one of the inputs of 403.gcc
where the code calls calloc with huge size in a tight loop.

http://llvm.org/viewvc/llvm-project?rev=176185&view=rev
(don't memset on calloc if we've just mmaped the memory)
partially improves the situation. I've remeasured all spec at r176185
and everything except 400.perlbench looks ok. 

|| BENCHMARK            ||  O2       || O2+asan || slowdown||
||      400.perlbench   ||    346    ||  1116   ||     3.23||
||          401.bzip2   ||    487    ||   837   ||     1.72||
||            403.gcc   ||    321    ||   586   ||     1.82||
||            429.mcf   ||    315    ||   581   ||     1.84||
||          445.gobmk   ||    409    ||   799   ||     1.95||
||          456.hmmer   ||    604    ||  1276   ||     2.11||
||          458.sjeng   ||    455    ||   854   ||     1.88||
||     462.libquantum   ||    482    ||   536   ||     1.11||
||        464.h264ref   ||    547    ||  1199   ||     2.19||
||        471.omnetpp   ||    309    ||   551   ||     1.78||
||          473.astar   ||    402    ||   647   ||     1.61||
||      483.xalancbmk   ||    221    ||   439   ||     1.99||
||           433.milc   ||    404    ||   668   ||     1.65||
||           444.namd   ||    368    ||   594   ||     1.61||
||         447.dealII   ||    323    ||   581   ||     1.80||
||         450.soplex   ||    234    ||   366   ||     1.56||
||         453.povray   ||    187    ||   395   ||     2.11||
||            470.lbm   ||    310    ||   400   ||     1.29||
||        482.sphinx3   ||    500    ||   911   ||     1.82||

The slowdown on 400.perlbench is not related to allocator, but looks like a 
code gen issue (issue 164). 

The fact that our allocator is doing mmap/munmap for alloca allocations > 128K 
is not a bug, but a feature.
It allows us to be more memory efficient in the common case where small 
allocations are prevalent. 
Later we may want to make the threshold configurable from a run-time flag 
(ASAN_OPTIONS=..)
the same way it is done in mallopt (M_MMAP_THRESHOLD).
Btw, the current asan default threshold is the same as in malloc on Ubuntu 
12.04, according to man page:

======
       M_MMAP_THRESHOLD
              When  an allocation request larger than the given value cannot be satisfied by an existing free chunk, the memory is guaranteed to be obtained with mmap().  Smaller requests might be allocated
              with either of mmap() or sbrk().  mmap()-allocated memory can be immediately returned to the OS when it is freed, but this is not true for all memory allocated  with  sbrk();  however,  memory
              allocated by mmap() and later freed is neither joined nor reused, so the overhead is greater.  Default: 128*1024.
=======

Original comment by konstant...@gmail.com on 28 Feb 2013 at 6:10

GoogleCodeExporter commented 9 years ago
Not going to work on it any time soon, the current situation is rather good.

Original comment by konstant...@gmail.com on 27 Dec 2013 at 10:51