Support re-exec of sanitized executable with preloading libasan on Linux and Android

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
Currently if we want to run sanitized shared library in unsanitized executable 
we have to run executable with LD_PRELOAD=libasan.so which may be inconvenient 
for users (have to modify startup scripts, etc.). Asan Mac backend implements 
MaybeReexec() function which allows to reexecute program with modified 
LD_PRELOAD. The same could be supported for Linux and Android.

What version of the product are you using? On what operating system?
gcc 4.8, Linux

Please provide any additional information below.
I am going to work on such patch. Please let me know if it's already in 
development.

Original issue reported on code.google.com by mguse...@gmail.com on 6 Aug 2014 at 11:34

GoogleCodeExporter commented 9 years ago

The reason for which we're having MaybeReexec() on Mac is that even when the 
instrumented main executable depends on the shared ASan runtime library, we 
have to preload that runtime library into the executable in order for the 
interceptors to work correctly (this is Mac-specific). We try our best to do 
re-execution very early on Mac, earlier than the program starts doing anything.

If only a single shared library in the testing environment is instrumented (and 
depends on the ASan runtime library), and the main executable is not, 
__asan_init() is going to be called only at the moment that library is loaded 
(including the case when we use dlopen() to load it). At that moment the main 
executable might have done a fair amount of work which we can't simply replay 
upon reexec().

I believe that unless the main executable depends on the ASan shared runtime 
the users must explicitly preload the runtime in order to test any pieces of 
code that might initialize late.

Original comment by ramosian.glider@gmail.com on 27 Oct 2014 at 7:10

GoogleCodeExporter commented 9 years ago

> If only a single shared library in the testing environment is instrumented 
(and
> depends on the ASan runtime library), and the main executable is not,
> __asan_init() is going to be called only at the moment that library is loaded
> (including the case when we use dlopen() to load it).

Right, explicitly banning the dlopen case would be nice but I'm not sure how to 
achieve this.

> At that moment the main
> executable might have done a fair amount of work which we can't simply replay
> upon reexec().

If main executable depends on the library (which is really the case we are 
interested in) then worst-case some library initializers might have been 
executed.

> I believe that unless the main executable depends on the ASan shared runtime 
the > users must explicitly preload the runtime in order to test any pieces of 
code
> that might initialize late.

This may get hard to do for some systems. Finding the exact place where a 
particular executable(s) depending on a library in a large autobuilt 
distribution is challenging.

Original comment by tetra2...@gmail.com on 27 Oct 2014 at 9:44

GoogleCodeExporter commented 9 years ago

> Right, explicitly banning the dlopen case would be nice but I'm not sure how 
to achieve this.
Why ban this case? Doesn't it work with LD_PRELOAD?

> If main executable depends on the library (which is really the case we are 
interested in) then worst-case some library initializers might have been 
executed.
Can you please remind why GCC doesn't use the static runtime library?

Original comment by ramosian.glider@gmail.com on 27 Oct 2014 at 9:50

GoogleCodeExporter commented 9 years ago

> Why ban this case? Doesn't it work with LD_PRELOAD?

No, dlopen may be executed in the middle of a working program when some files 
already got written so reexecution would change the semantics unpredictably.

> Can you please remind why GCC doesn't use the static runtime library?

Well, both GCC and Clang support both static and dynamic runtimes, it's just 
the default choice in GCC is different (for historical reasons). One good thing 
about AsanDSO is that it allows running sanitized .so with unsanitized 
executables.

Original comment by tetra2...@gmail.com on 27 Oct 2014 at 9:55

GoogleCodeExporter commented 9 years ago

> No, dlopen may be executed in the middle of a working program when some files 
already got written so reexecution would change the semantics unpredictably.

I mean, in the current setup preloading the library lets you test both 
instrumented executables, and instrumented libraries with uninstrumented 
executables. Re-exec works only for the former case, but that does not mean we 
should ban the latter one just to make re-exec work (if I'm understanding 
correctly what you want).

Original comment by ramosian.glider@gmail.com on 27 Oct 2014 at 10:16

GoogleCodeExporter commented 9 years ago

> I mean, in the current setup preloading the library
> lets you test both instrumented executables
> Re-exec works only for the former case,
> but that does not mean we should ban the latter one
> just to make re-exec work (if I'm understanding correctly what you want).

Ah, sure, manual LD_PRELOAD would work in this case. I just meant that 
ASAN_OPTIONS=maybe_reexec=1 wouldn't.

Original comment by tetra2...@gmail.com on 27 Oct 2014 at 10:18

GoogleCodeExporter commented 9 years ago

Here's a link to original discussion of reexec porting: 
https://groups.google.com/forum/#!searchin/address-sanitizer/reexec/address-sani
tizer/Xav2pArPJ3E/tXZRsX6S7LoJ

Original comment by tetra20...@gmail.com on 28 Oct 2014 at 8:29

GoogleCodeExporter commented 9 years ago

I hate the idea of reexec (even though we have it for other use cases).
This is too fragile and too complex. 
Maybe you can get away with manual LD_PRELOAD and un-setting LD_PRELOAD for 
children?

Original comment by konstant...@gmail.com on 31 Oct 2014 at 10:36

GoogleCodeExporter commented 9 years ago

> I hate the idea of reexec (even though we have it for other use cases).
> This is too fragile and too complex.

It surely is. On the other hand it improves usability in some very common 
situation (instrument parts of large distribution) and we already have it on 
other platforms.

> Maybe you can get away with manual LD_PRELOAD and un-setting LD_PRELOAD for 
children?

In many cases that would mean modifying source code to set/unset LD_PRELOAD 
which would be a big blocker.

Original comment by tetra2...@gmail.com on 1 Nov 2014 at 5:21

GoogleCodeExporter commented 9 years ago

Maybe we should try the same hack as we do on Android? 
Everything is runing with asan, but in inactivated mode. 
As soon as some module calls __asan_init we activate asan.

Original comment by konstant...@gmail.com on 5 Nov 2014 at 1:20

GoogleCodeExporter commented 9 years ago

> Maybe we should try the same hack as we do on Android? 
> Everything is runing with asan, but in inactivated mode. 
> As soon as some module calls __asan_init we activate asan.

Konstantin,
If I understood it correctly, in your proposal Asan will be activated for any 
module as soon as allocation is done or some intercepted function is called. 
Indeed 
very soon for all active processes. That's not what we desired. The intention 
was to minimize overhead by preloading asan rt only for needed executables.

Can you specify what exactly you don't like in reexec approach? It's fair that 
in dl_open case we can't rely on it. So probably we need to handle such case 
separately. But in case of run-time init the executable is not started yet and 
reexec shouldn't be an issue.

Original comment by mguse...@gmail.com on 5 Nov 2014 at 12:59

GoogleCodeExporter commented 9 years ago

> It's fair that in dl_open case we can't rely on it.
> So probably we need to handle such case separately.

We could unwind stack and check for dlopen. Or just intercept dlopen.

Original comment by tetra20...@gmail.com on 5 Nov 2014 at 1:06

GoogleCodeExporter commented 9 years ago

mguseva2: please see how ASAN_OPTIONS=start_deactivated=1 works
(e.g. in test/asan/TestCases/Posix/start-deactivated.cc)
asan will get activated only once an instrumented module is loaded, 
i.e. a binary that does not have asan instrumentation will not activate asan.
This *may* be the solution you need for your use case.

Original comment by konstant...@gmail.com on 5 Nov 2014 at 6:58

GoogleCodeExporter commented 9 years ago

Thank you, Konstantin, I see. Currently on Linux the libasan.so calls 
__asan_init itself but it seems to be redundant and must be fine to change it 
to internal init without activation. So we can try deactivated approach. We 
still need to check the overhead it will produce because of interceptors and 
heap redzones.

Original comment by mguse...@gmail.com on 7 Nov 2014 at 12:18

GoogleCodeExporter commented 9 years ago

Redzone size is zero prior to activation.
Interceptors are supposed to have very low overhead (and they should not do any 
poisoning/unpoisoning while deactivated).

Original comment by euge...@google.com on 7 Nov 2014 at 12:21

GoogleCodeExporter commented 9 years ago

After some modification the start_deactivated flag works in our case. I've 
submitted changes I applied for review http://reviews.llvm.org/D6265.
Regarding redzones - as I see in asan_rtl.cc and asan_activation.cc redzone and 
max_redzone values are set to 16 in deactivated mode. Does it mean there are 
still small redzones allocated for heap memory?

I still wonder about Reexec. In current design MaybeReexec part of Asan runtime 
is not Mac-specific. But it is implemented only for Mac. I think ReExec maybe 
useful feature on Linux as well if we fix dlopen case. What do you think?

Original comment by mguse...@gmail.com on 14 Nov 2014 at 10:56

GoogleCodeExporter commented 9 years ago

As for the redzones, I think 16 bytes is the minimum, as they are used to store 
some meta information about the allocation.

Original comment by euge...@google.com on 14 Nov 2014 at 1:35

GoogleCodeExporter commented 9 years ago

Ping. 
> I still wonder about Reexec. In current design MaybeReexec part of Asan 
runtime is not Mac-specific. But it is implemented only for Mac. I think ReExec 
maybe useful feature on Linux as well if we fix dlopen case. What do you think? 

So at least it's strange MaybeReexec is designed as common code while it's 
really used only on Mac.

Original comment by mguse...@gmail.com on 4 Dec 2014 at 9:05

GoogleCodeExporter commented 9 years ago

Original comment by ramosian.glider@gmail.com on 30 Jul 2015 at 9:05

Added labels: ProjectAddressSanitizer

GoogleCodeExporter commented 9 years ago

Adding Project:AddressSanitizer as part of GitHub migration.

Original comment by ramosian.glider@gmail.com on 30 Jul 2015 at 9:06

ramosian-glider / address-sanitizer

Support re-exec of sanitized executable with preloading libasan on Linux and Android #330