asan does not always detect access beyond mmaped page

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Compile attached test program with either gcc or clang:

(test program essentially is):

    ---- 8< ----
    page1 = mmap(NULL, PAGE_SIZE, R | W)
    page2 = mmap(NULL, PAGE_SIZE, R | W)

    // access beyond page1 (catched by gcc/asan, dies on clang/asan)
    // page1[PAGE_SIZE] = 99

    page2[PAGE_SIZE] = 199   // <- access beyond page2 (not catched by asan)
    ---- 8< ----

$ gcc -g -Wall -fsanitize=address page-access-beyond.c -o a.gcc
$ clang-3.6 -g -Wall -fsanitize=address page-access-beyond.c -o a.clang

2. Run it

$ ./a.gcc
$ ./a.clang

What is the expected output? What do you see instead?

Expected: ASAN reports bad access at `page2[PAGE_SIZE]` write.

I see: nothing

What version of the product are you using? On what operating system?

$ gcc --version
gcc (Debian 4.9.1-15) 4.9.1

$ clang-3.6 --version
Debian clang version 3.6.0-svn218446-1 (trunk) (based on LLVM 3.6.0)
Target: x86_64-pc-linux-gnu
Thread model: posix

$ uname -a
Linux teco 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux

Please provide any additional information below.

If `page1[PAGE_SIZE]` write is uncommented - a.clang just dies with 
"Segmentation fault" and a.gcc reports problem correctly:

==20907==ERROR: AddressSanitizer: unknown-crash on address 0x7f452c447000 at pc 
0x400aa7 bp 0x7fffd9135b40 sp 0x7fffd9135b38
WRITE of size 1 at 0x7f452c447000 thread T0
    #0 0x400aa6 in main /home/kirr/tmp/trashme/page-access-beyond.c:35
    #1 0x7f452afd1b44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #2 0x400878 (/home/kirr/tmp/trashme/a.gcc+0x400878)

AddressSanitizer can not describe address in more detail (wild memory access 
suspected).
SUMMARY: AddressSanitizer: unknown-crash 
/home/kirr/tmp/trashme/page-access-beyond.c:35 main
Shadow bytes around the buggy address:
  0x0fe925880db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe925880dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe925880dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe925880de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe925880df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0fe925880e00:[fe]fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fe925880e10: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fe925880e20: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fe925880e30: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fe925880e40: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
  0x0fe925880e50: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==20907==ABORTING

Valgrind 3.10 inversely catches page2 write-beyond-end, but does not catch 
page1 write-beyond-end.

Thanks beforehand,
Kirill

Original issue reported on code.google.com by kirill.s...@gmail.com on 29 Sep 2014 at 8:24

Attachments:

page-access-beyond.c

GoogleCodeExporter commented 9 years ago

Note that Asan does not instrument mmap so catching signal and aborting is the 
best it can do here. Clang's segfault is unexpected of course.

Original comment by tetra2...@gmail.com on 29 Sep 2014 at 10:52

GoogleCodeExporter commented 9 years ago

Thanks for replying.

I've noted there is mmap interceptor in compiler-rt/lib/tsan/ which could be 
probably made shared with asan and via this way such kind of errors could be 
detected more robustly.

Thanks again,
Kirill

Original comment by kirill.s...@gmail.com on 29 Sep 2014 at 11:01

GoogleCodeExporter commented 9 years ago

Can you print the page addresses to make sure the second access doesn't fall 
into the first page (or any other mapped page)?

Regarding the mmap() interceptor, I think this is a half-measure that won't 
help, for example, when mmap() is called from libc() via a local call, or when 
the raw syscall is invoked.
IMO the best thing to do is to intectept __NR_mmap (together with other 
syscalls) using seccomp-bpf.

Original comment by ramosian.glider@gmail.com on 29 Sep 2014 at 11:30

GoogleCodeExporter commented 9 years ago

Yes, the pages are adjacent to each other, because OS kernel tries to merge 
mappings if possible, so that it is only 1 VMA internally:

(print patch)
--- a/page-access-beyond.c.kirr
+++ b/page-access-beyond.c
@@ -27,6 +27,9 @@ int main()
     if (page2 == MAP_FAILED)
         xabort("mmap2");

+    printf("p1: %p\n", page1);
+    printf("p2: %p\n", page2);
+
     page1[0]            = 0;
     page1[pagesize-1]   = 1;

(output)
p1: 0x7f05a5dd8000
p2: 0x7f05a5dd7000

and that's why asan does not complain.

But the point here is that it is still an error - because OS could put page1 and page2 at any place because mmaps specify addr=NULL which means "map anywhere.

Merging them is just kernel optimisation which does not work 100% of the time (e.g. if page1 was allocated at address space hole of only 1 page - there will be no chance for page2 to be allocated adjacent)

I think the solution here would be to one way or another intercept mmap, in interceptor first allocate 2-pages-larger address space with

map(NULL, PROT_NONE, MAP_NORESERVE, len=orig_len + 2*pagesize)

and then perform original mmap into inside of this address space region with MAP_FIXED.

This way there will be protective pages before and after valid mapping and detecting access beyond would work.

Thanks again, Kirill


Original comment by `kirill.s...@gmail.com` on 29 Sep 2014 at 11:56

GoogleCodeExporter commented 9 years ago

tampering with mmap is risky. 
If a user mmaps 32K small regions we will create 32K (or 64K?) redzone pages 
and will exhaust the kernel limit of contiguous mappings.

This can be done under a flag though -- a patch is welcome.

Most of the code we care about does not use mmap directly so this
has never been a priority for us. 

(glider, seccomp-bpf is a completely different story, let's not mix it here)

Original comment by konstant...@gmail.com on 29 Sep 2014 at 9:15

GoogleCodeExporter commented 9 years ago

Yeah, the limit on the number of mappings with different permissions appears to 
be quite strict: 
http://lxr.free-electrons.com/source/include/linux/sched/sysctl.h#L34 (that's 
2^16-5).

IMO we'd better not change the behavior of mmap(), because there's a number of 
corner cases when the users don't want that. If we do that solely for libc 
(i.e. if __NR_mmap remains unintercepted) we'll have all sorts of mixed 
{mmap,munmap}_{with,without}_redzone() calls for the same pages that'll end up 
corrupting something.

Original comment by ramosian.glider@gmail.com on 30 Sep 2014 at 9:53

GoogleCodeExporter commented 9 years ago

Thanks for commenting. I understand about priorities etc. Just some notes about 
limits and otherwise:

I've measured my running firefox:

  $ cat /proc/`pidof iceweasel`/maps | wc -l
  852

for 850 mappings adding 2 red-zone pages would be 1700 pages = < 7MB of address 
space and 1700 additional VMAs. Which I think is rather low on 64bit, given 
that it is only address space, not memory. Also it fits with good backup into 
default limits for max VMA/process.

Even for all running processes _combined_ on my machine (x86_64 on Debian 
testing, usual desktop) if such combined process could run:

  # cat /proc/*/maps |wc -l
  15053

that would result in only < 120MB of address space for red-zone pages and 
~30000 additional VMAs, which to me justifies as low overhead for 64bit and 
also fits into default limit for VMA/process.

So if a user creates special program which creates a lot of mappings he/she 
should probably be already aware of

    /proc/sys/vm/max_map_count

knob and _that_ is the flag to tune to run such a program with additional mmap 
sanitizations.

So in ASAN I'd do it unconditionally.

IMO we'd better not change the behavior of mmap(), because there's a number of corner cases when the users don't want that

Could you please provide an example?

To me it looks like if mmap was without MAP_FIXED, the kernel is free to place the mapping anywhere and discovering that corner cases clients incorrectly rely on would be only good.

And if original mmap was with MAP_FIXED - just don't add red zone around it.

If we do that solely for libc (i.e. if __NRmmap remains unintercepted) we'll have all sorts of mixed {mmap,munmap}{with,without}_redzone() calls for the same pages that'll end up corrupting something.

On munmap see to which region it belongs, and if original region (which could be later changed with MAP_FIXED mapping into it, but anyway) was originally mmaped with red-zone, and we are unmapping adjacent to red-zone (either part), just adjust the red zone.

And if kernel __NR_mmap is called with correct args directly overcoming interceptor, it will just unmap some region inside original larger mapping with red-zones, so red-zones will stay, and since they are mapped with PROT_NONE, will catch later incorrect read/write access to them. The only problem here is that red-zone mapping are leaking if unmap is called directly.

Wouldn't that work?


Original comment by `kirill.s...@gmail.com` on 30 Sep 2014 at 11:41

GoogleCodeExporter commented 9 years ago

> I've measured my running firefox:
>  $ cat /proc/`pidof iceweasel`/maps | wc -l

I think this stat is low exactly because the kernel consolidates adjacent 
mappings. For example, Chrome calls mmap() 600+ times during the first seconds 
of startup, and many of those calls would've been merged into a single region.

https://bugzilla.kernel.org/show_bug.cgi?id=67651 gives an idea of how the 
number of mappings changes when adjacent mappings with similar permissions 
aren't combined.

> So if a user creates special program which creates a lot of mappings he/she 
should probably be already aware of
>    /proc/sys/vm/max_map_count
> knob and _that_ is the flag to tune to run such a program with additional 
mmap sanitizations.

Thanks, I wasn't aware of this. Indeed, sysctl vm.max_map_count=16777216 lets 
the program create far more mappings (although this probably affects the 
performance).

> Could you please provide an example?
MAP_FIXED was my first thought. I was also thinking about doing something like:
  ptr = mmap(0, size, ...)
  ptr2 = mmap(ptr + size, ...)

, but this isn't guaranteed to work (i.e. is one of the cases you want to look 
for), so probably MAP_FIXED is the only interesting case.

> On munmap see to which region it belongs, and if original region (which could 
be later changed with MAP_FIXED mapping into it, but anyway) was originally 
mmaped with red-zone, and we are unmapping adjacent to red-zone (either part), 
just adjust the red zone.
Sounds good

> And if kernel __NR_mmap is called with correct args directly overcoming 
interceptor, it will just unmap some region inside original larger mapping with 
red-zones, so red-zones will stay, and since they are mapped with PROT_NONE, 
will catch later incorrect read/write access to them. The only problem here is 
that red-zone mapping are leaking if unmap is called directly.
We'll also need to remove the left part of the original larger mapping from 
ASan's mappings registry so that further munmap() calls do not attempt to unmap 
that part's redzone (I'm talking about the right redzone, although we can 
possibly create both).

I now agree mmap() interception won't break much, although may increase lookup 
time in the kernel structures handling the mappings.

Original comment by ramosian.glider@gmail.com on 30 Sep 2014 at 12:29

GoogleCodeExporter commented 9 years ago

On Tue, Sep 30, 2014 at 12:29:08PM +0000, address-sanitizer@googlecode.com 
wrote:
>
> Comment #8 on issue 348 by ramosian.glider@gmail.com: asan does not always
> detect access beyond mmaped page
> https://code.google.com/p/address-sanitizer/issues/detail?id=348
>
> >I've measured my running firefox:
> >  $ cat /proc/`pidof iceweasel`/maps | wc -l
>
> I think this stat is low exactly because the kernel consolidates adjacent
> mappings. For example, Chrome calls mmap() 600+ times during the first
> seconds of startup, and many of those calls would've been merged into a
> single region.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=67651 gives an idea of how the
> number of mappings changes when adjacent mappings with similar permissions
> aren't combined.

Thanks for pointers. Yes, merging VMAs is important for performance,
because the kernel does binary tree lookup `addr -> vma` on every
pagefault and minimizing #vma minimizes that tree and lookup time.

I can't say about Chrome, but for Firefox and Chromium number of mmap
_calls_ look like this:

for Firefox:

    $ mmaps() {
    >     grep mmap $1 |grep -v MAP_FIXED |grep -v "mmap resumed"
    > }

    $ strace -tt -f -o x.log iceweasel
    # go to yandex.ru, google.ru, lwn.net; exit

    $ mmaps x.log |wc -l
    744

for Chromium:

    # as root
    $ strace -tt -f -o y.log chromium --user-data-dir=`pwd`/userdata
    # go to yandex.ru, google.ru, lwn.net; exit

    $ mmaps y.log |wc -l
    3523

it looks Chromium does more mmap calls, but let's look deeper

    # Chromium: N(mmap calls) distribution with child pids
    $ mmaps y.log |awk '{print $1}' |sort |uniq -c | sort -nr | head -20
       1115 14292
        909 14297
        323 14317
        287 14324
        131 14365
        118 14346
         88 14321
         83 14342
         72 14389
         39 14349
         37 14401
         35 14309
         29 14313
         26 14327
         17 14316
         17 14306
         14 14415
         14 14398
         14 14376
          9 14407

A-ha, N(mmap calls) per process is not that high - judging by pid, there is 1
or 2 processes which are maybe some managers (first two pids) and others
are workers with lower N(mmap calls) count.

Anyway

    N(mmap calls) is limited to ~ < 1000 per process

I think this is rather low and is of the same order as resulting mapping
number. I.e.

    My point about low mappings count also stand for low mmap calls.

For completeness, here is the same distribution for Firefox:

    # Firefox: N(mmaps) distribution with child pids
    $ mmaps x.log |awk '{print $1}' |sort |uniq -c | sort -nr | head -20
        517 18401
         38 18436
         31 18406
         31 18403
         29 18453
         29 18452
         11 18421
          9 18425
          7 18446
          7 18419
          6 18467
          5 18424
          4 18431
          3 18443
          3 18437
          3 18428
          3 18422
          2 18450
          2 18402
          1 18427

> >So if a user creates special program which creates a lot of mappings
> >he/she should probably be already aware of
> >    /proc/sys/vm/max_map_count
> >knob and _that_ is the flag to tune to run such a program with additional
> >mmap sanitizations.
>
> Thanks, I wasn't aware of this. Indeed, sysctl vm.max_map_count=16777216
> lets the program create far more mappings (although this probably affects
> the performance).

Yes, as said above N(mmaps) increases pagefault lookup time and maybe
something else.

Please also note, that if we create red-zones for "before" and "after"
region, for two such larger regions with red-zones, chance are high the
kernel will allocate them adjacent, i.e.

    red-before1 region1 ref-after1  ref-before2 region2 red-after2

so adjacent {after,before} pairs will be merged into one PROT_NONE
mapping by kernel.

That means the overhead will be ~ 1 PROT_NONE mapping for real mmap
without MAP_FIXED, not 2 PROT_NONE / mmap as I've mistakenly thought before.

> >Could you please provide an example?
> MAP_FIXED was my first thought. I was also thinking about doing something
> like:
>   ptr = mmap(0, size, ...)
>   ptr2 = mmap(ptr + size, ...)
>
> , but this isn't guaranteed to work (i.e. is one of the cases you want to
> look for), so probably MAP_FIXED is the only interesting case.

And we do not add red-zones to MAP_FIXED for this reason.

> >On munmap see to which region it belongs, and if original region (which
> >could be later changed with MAP_FIXED mapping into it, but anyway) was
> >originally mmaped with red-zone, and we are unmapping adjacent to red-zone
> >(either part), just adjust the red zone.
> Sounds good
>
> >And if kernel __NR_mmap is called with correct args directly overcoming
> >interceptor, it will just unmap some region inside original larger mapping
> >with red-zones, so red-zones will stay, and since they are mapped with
> >PROT_NONE, will catch later incorrect read/write access to them. The only
> >problem here is that red-zone mapping are leaking if unmap is called
> >directly.
> We'll also need to remove the left part of the original larger mapping from
> ASan's mappings registry so that further munmap() calls do not attempt to
> unmap that part's redzone (I'm talking about the right redzone, although we
> can possibly create both).
>
> I now agree mmap() interception won't break much, although may increase
> lookup time in the kernel structures handling the mappings.

That was my point. And the increase, if it is 1 protective VMA per real
mapping should be imho not high.

Thanks,
Kirill

Original comment by kirill.s...@gmail.com on 30 Sep 2014 at 3:08

GoogleCodeExporter commented 9 years ago

I'll be glad to review a patch that adds redzones around pages mapped with 
mmap(NULL, ...) under a flag. We can decide whether it's worth having it by 
default once we try it on some big apps.

Original comment by ramosian.glider@gmail.com on 30 Sep 2014 at 3:35

GoogleCodeExporter commented 9 years ago

I see, thanks.

Original comment by kirill.s...@gmail.com on 30 Sep 2014 at 6:40

steleman / address-sanitizer

asan does not always detect access beyond mmaped page #348