gopalshankar / address-sanitizer

Automatically exported from code.google.com/p/address-sanitizer
0 stars 0 forks source link

ASan breaks dead stripping (-ffunction-sections/-Wl,-gc-section on Linux, -dead_strip on OSX) #260

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Test case by Ryan Govostes:
================================================
int undefined();

int (*unused)() = undefined;

int main() {
        return 0;
}
================================================

$ clang++ t.cc -o t -Wl,--gc-sections -ffunction-sections -fsanitize=address 
-mllvm -asan-globals=0
$ clang++ t.cc -o t -Wl,--gc-sections -ffunction-sections -fsanitize=address 
-mllvm -asan-globals=1
/tmp/t-1710b5.o:(.data+0x0): undefined reference to `undefined()'
clang-3.5: error: linker command failed with exit code 1 (use -v to see 
invocation)

This happens because ASan creates a per-module global array that references 
every global in the module irrespective of its usage.

Original issue reported on code.google.com by ramosian.glider@gmail.com on 31 Jan 2014 at 2:54

GoogleCodeExporter commented 9 years ago
<historical note>
In the early versions of asan we called __asan_register_global on every 
instrumented global separately, thus there was no need to put all the globals 
into an array, and thus this problem did not exist. We were forced to replace N 
calls to __asan_register_global(g) with a single call to 
__asan_register_globals(g, N) for compile- and run- time performance reasons. 
</historical note>

Original comment by konstant...@gmail.com on 4 Feb 2014 at 11:26

GoogleCodeExporter commented 9 years ago
For the proper test we also need to use -fdata-sections, otherwise 
the following test will not link even w/o asan:

% cat sec.cc 
int undefined();
int defined() { return 1; }
void *AAA = (void*)&defined;
void *BBB = (void*)&undefined;
int main() {
  return AAA != 0;
}

% clang++ sec.cc  -Wl,--gc-sections -ffunction-sections
/tmp/sec-e95f7f.o:(.data+0x8): undefined reference to `undefined()'
clang-3.5: error: linker command failed with exit code 1 (use -v to see 
invocation)

% clang++ sec.cc  -Wl,--gc-sections -ffunction-sections -fdata-sections
% 

Original comment by konstant...@gmail.com on 4 Feb 2014 at 11:38

GoogleCodeExporter commented 9 years ago
Looks like the linkers on Linux and OSX are clever enough to emit the symbols 
for the start and end of a certain section (see the attached example). We can 
make the compiler put per-variable global descriptors into a special data 
section and iterate over it using these two symbols. This shall allow the 
linker discard the unused globals since they won't be transitively referenced 
by the global constructors array.

Original comment by ramosian.glider@gmail.com on 4 Feb 2014 at 1:52

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by ramosian.glider@gmail.com on 19 Jun 2014 at 11:12

GoogleCodeExporter commented 9 years ago
Progress report.
I've an almost working implementation of globals instrumentation on Linux. The 
main problem with the approach suggested above (keeping the descriptors in a 
single data section) is that it still doesn't work with --gc-sections, because 
that flag naturally removes only dead sections and can't carve a single 
descriptor pointing to a dead global out of the section. E.g. for the example 
given in #2 the data section .data.BBB won't be removed, because it's 
referenced by the live section containing all the global descriptors.

To deal with this we need to make the following changes:
1. Emit the descriptor for each global foo into its own _asan_globals.foo 
section
2. Put a pointer to that descriptor at the end of global's redzone.
3. Link with a linker script that merges all the _asan_globals.* sections into 
a single _asan_globals one.

The second step is required because otherwise the linker will garbage collect 
the descriptors of all globals. The drawback of this approach is that it'll 
move all zero-initialized globals from .bss to .data, where they'll occupy 
actual disk space.

Original comment by ramosian.glider@gmail.com on 23 Jun 2014 at 1:52

GoogleCodeExporter commented 9 years ago
Yesterday we've discussed the possibility to make some weak reference between 
the global descriptor and the global, talking about some analog of a weak 
symbol (if the global is deleted the pointer becomes 0). However this is 
impossible if the global and the descriptor are in the same object module (we 
can't change the global's linkage to be extern_weak).

Original comment by ramosian.glider@gmail.com on 24 Jun 2014 at 12:46

GoogleCodeExporter commented 9 years ago
Another idea suggested by Evgeniy to avoid bloating the zero-initialized 
globals:

1. For each global we create its descriptor referencing that global.
2. For each global in the .data section:
  a) put a pointer to that global's descriptor into its redzone;
  b) for each zero-initialized global from the .bss section of the same module referenced by this global, add a pointer to that global's descriptors to the parent global's redzone.
3. For each function referencing a global, add a reference to that global's 
descriptor to that function.

The only problem here is that we can't easily reference anything from a 
function.

Original comment by ramosian.glider@gmail.com on 24 Jun 2014 at 1:08

GoogleCodeExporter commented 9 years ago
Yet another idea from Dima Polukhin: make weak references from descriptor array 
to instrumented globals so that when a global is dead-stripped the descriptor 
is retained.
Not sure if this is supported on Linux (maybe) and OSX (probably no).

Original comment by ramosian.glider@gmail.com on 25 Aug 2014 at 3:44