gopalshankar / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
0 stars 0 forks source link

[patch] decrease resident memory usage. #256

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

Here is a patch that helps decreasing ASan's memory usage. On my use case, this 
decrease memory consumption by 20% with no noticeable performance impact (in 
fact, it seems the resulting process is marginally faster).

The idea here is to clear memory using a fixed mmap instead of memset since 
both will result in zeroed memory but mmap memory is not resident before 
someone writes something in it. This also could result in faster code for large 
clear since memset scans the memory and mmap only change the virtual address 
map, however while memset runs in userland, mmap implies performing a syscall 
that will take a lock to modify a shared (between threads) resource.

Original issue reported on code.google.com by florent....@intersec.com on 29 Dec 2013 at 11:28

Attachments:

GoogleCodeExporter commented 9 years ago
>> On my use case
What is your case? 
I suspect it is doing huge allocations and not using most of the allocated 
memory.
Not the very common case. 

Please, provide a benchmark.
Very likely this patch will slowdown the common case of relatively small 
allocations.

Original comment by konstant...@gmail.com on 9 Jan 2014 at 4:57

GoogleCodeExporter commented 9 years ago
My use case is the binary that contains all the unit tests for the base 
library, it contains a lot of small and some huge allocations. The huge 
allocations (a few mmap of 256MiB) are quite rare but enough to have an impact 
on the amount of resident memory consumed by Asan's shadow memory.

I ran the attached benchmark (a binary tree with resizable arrays in the nodes) 
on a Linux desktop (Linux 3.8). This gave the following result (best occurrence 
given for both):
* without my patch:
        User time (seconds): 77.39
        System time (seconds): 0.26
        Maximum resident set size (kbytes): 500176

* with my patch:
        User time (seconds): 77.42
        System time (seconds): 0.30
        Maximum resident set size (kbytes): 499168

AFAICT, the difference in term of performance is well under the measure error 
threshold, so I tend to conclude that the patch has no noticeable impact on 
performance (or at most a very tiny slowdown).

In both cases, the program was compiles with clang 3.4 with the following 
command line:

 clang -O2 -g -fsanitize=address -o blah blah.c

Let me know if you have a better idea for a meaningful benchmark.

Original comment by florent....@polytechnique.org on 9 Jan 2014 at 8:38

Attachments:

GoogleCodeExporter commented 9 years ago
The difference in resident set size is 1Mb, or 0.2% of the overall size.
This is not a convincing gain. 

Also, a benchmarks should
  - have less unrelated logic (no trees, nodes, etc. Just malloc/free)
  - have multiple threads, all malloc-intensive
  - have small allocations as well as large allocations
  - have most of the malloc-ed memory actually used

Note also, that once we free the large malloc-ed block we call 
FlushUnneededASanShadowMemory (madvise((void*)addr, size, MADV_DONTNEED);)
so if the large blocks are actually freed, the shadow should be fully freed as 
well.

Original comment by konstant...@gmail.com on 9 Jan 2014 at 8:54

GoogleCodeExporter commented 9 years ago
I didn't expect the resident set size to differ in this bench since my patch 
only affects large allocations, not small ones.

AFAICT, the FlushUneededShadowMemory() is called for unmaps only. My patch 
catches every single unpoising.

Original comment by florent....@intersec.com on 9 Jan 2014 at 9:19

GoogleCodeExporter commented 9 years ago
>> My patch catches every single unpoising.
Exactly. mmaps are expensive. 
Unless I see a proof that the proposed approach does not slowdown the common 
use-case,
I am not willing to accept this patch. 
Closing now, please reopen once you have convincing benchmarks. 

Original comment by konstant...@gmail.com on 29 Jan 2014 at 1:16

GoogleCodeExporter commented 9 years ago
>> My patch catches every single unpoising.
> Exactly. mmaps are expensive.

memsetting 4k of memory is also expensive.

> Unless I see a proof that the proposed approach does not slowdown the common 
use-case,

I'll provide the requested benchmark as soon as possible (but I have 
professional constraint and cannot spend too much time on it for now). That 
said, you should seriously consider adding a reference benchmark in the 
repository if you expect contributions, because you can hardly ask for a 
benchmark to every single person who propose a patch.

Original comment by florent....@intersec.com on 1 Feb 2014 at 4:31

GoogleCodeExporter commented 9 years ago
I've applied something based on your patch in r201400, but we start using mmap 
instead of memset at a larger region size (64K by default).

Original comment by euge...@google.com on 14 Feb 2014 at 11:48

GoogleCodeExporter commented 9 years ago
thx

Original comment by florent....@intersec.com on 14 Feb 2014 at 3:58