Ultra-compatible memory mapping

GoogleCodeExporter commented 9 years ago

Inspired by http://reviews.llvm.org/D6387 and earlier changes to TSan memory 
layout, here is a plan to change MSan memory layout to
 * support gdb (i.e. non-ASLR case, with code at 0x555...),
 * support non-PIE (with main executable at 0x40000),
 * and even make origin mapping a little bit cheaper!

#define MEM_TO_SHADOW(x) (x) ^ 0x200000000000ULL;
#define SHADOW_TO_ORIGIN(x) (x) ^ 0x100000000000ULL

I.e., mem-to-shadow mapping flips the second highest bit; shadow-to-origin 
flips the third highest bit. This makes mem-to-origin a single XOR, instead of 
OR+ADD in the current scheme.

addr 000000000000 shadow 200000000000 origin 300000000000
addr 0fffffffffff shadow 2fffffffffff origin 3fffffffffff
addr 500000000000 shadow 700000000000 origin 600000000000
addr 5d0000000000 shadow 7d0000000000 origin 6d0000000000
addr 7d0000000000 shadow 5d0000000000 origin 4d0000000000
addr 7fffffffffff shadow 5fffffffffff origin 4fffffffffff

000000000000 - 0fffffffffff app-1
100000000000 - 1fffffffffff invalid
200000000000 - 2fffffffffff shadow-1
300000000000 - 3fffffffffff origin-1
400000000000 - 4cffffffffff invalid
4d0000000000 - 4fffffffffff origin-3
500000000000 - 5cffffffffff app-2
5d0000000000 - 5fffffffffff shadow-3
600000000000 - 6cffffffffff origin-2
6d0000000000 - 6fffffffffff invalid
700000000000 - 7cffffffffff shadow-2
7d0000000000 - 7fffffffffff app-3

Looks a bit crazy, but I don't see why it would not work.

Original issue reported on code.google.com by euge...@google.com on 27 Nov 2014 at 9:23

GoogleCodeExporter commented 9 years ago

That's Linux-x86_64 of course.
Heap goes to 0x7d..0x7e, as in TSan.

Original comment by euge...@google.com on 27 Nov 2014 at 9:24

GoogleCodeExporter commented 9 years ago

There is a weird code size (and maybe performance) regression due to llvm doing 
something inefficient with xor(xor(x)) on X86:
http://llvm.org/bugs/show_bug.cgi?id=21749

We will probably win more by disabling PIE, so I'm inclined to go ahead with 
this anyway and optimize it later. This problem should be fixable with some 
kind of peephole optimization in the backend.

Original comment by euge...@google.com on 5 Dec 2014 at 12:59

GoogleCodeExporter commented 9 years ago

W/o disabling PIE performance measurements are inconclusive.
470.lbm speeds up by ~10% with the new mapping for some reason.
The rest are +-3%.
Code size grows by ~2.5%, but on one of the specs - by close to 10%.

Original comment by euge...@google.com on 12 Dec 2014 at 4:32

GoogleCodeExporter commented 9 years ago

Here is another idea with all the same properties.

#define MEM_TO_SHADOW(mem) (((uptr)(mem)) ^ 0x500000000000ULL)
#define SHADOW_TO_ORIGIN(mem) (((uptr)(mem)) + 0x100000000000ULL)

Here, shadow-to-origin is an addition of a constant, which can be wrapped into 
the addressing mode without using an extra register.

Memory layout:

000000000000 - 050000000000 app-1
050000000000 - 100000000000 shadow-2
100000000000 - 150000000000 invalid
150000000000 - 200000000000 origin-2
200000000000 - 300000000000 shadow-3
300000000000 - 400000000000 origin-3
400000000000 - 500000000000 invalid
500000000000 - 550000000000 shadow-1
550000000000 - 600000000000 app-2
600000000000 - 650000000000 origin-1
650000000000 - 700000000000 invalid
700000000000 - 800000000000 app-3

Performance is the same or better (old, new, new/old ratio):
       400.perlbench,      2822.00,      2849.00,         1.01
           401.bzip2,      1975.00,      1978.00,         1.00
             429.mcf,      1665.00,      1619.00,         0.97
           445.gobmk,      2257.00,      2224.00,         0.99
           456.hmmer,      2954.00,      2977.00,         1.01
           458.sjeng,      3613.00,      3567.00,         0.99
      462.libquantum,      1381.00,      1380.00,         1.00
         464.h264ref,      4348.00,      4366.00,         1.00
           473.astar,      1312.00,      1311.00,         1.00
       483.xalancbmk,      1287.00,      1227.00,         0.95
            433.milc,      1404.00,      1405.00,         1.00
            444.namd,      1684.00,      1690.00,         1.00
          447.dealII,      1309.00,      1305.00,         1.00
          450.soplex,       996.00,       983.00,         0.99
          453.povray,      1505.00,      1519.00,         1.01
             470.lbm,      1325.00,      1329.00,         1.00
         482.sphinx3,      2593.00,      2590.00,         1.00

Code size:
- instrumented code is smaller by 0.5% on average
- run-time library is larger by 40Kb, mostly because MEM_IS_APP(x) check is 
more complex, and it is inline in multiple places (ex. all interceptors, as 
part of __msan_poison()). A lot of these are redundant and can be removed later.

Original comment by euge...@google.com on 26 Jan 2015 at 8:52

GoogleCodeExporter commented 9 years ago

attaching the current patchset just in case

Original comment by euge...@google.com on 27 Jan 2015 at 3:16

Attachments:

viralcode / memory-sanitizer

Ultra-compatible memory mapping #76