Open jcowgill opened 7 years ago
It is possible at the code size and execution time cost, which we are not willing to pay. Any chance to get the kernel to cooperate?
This would not be the first time when the kernel change breaks the sanitizers.
The last significant one was by H.J. Lu when he changed the based from 0x7....
to 0x555...
.
It caused lots of trouble for us in msan and tsan.
What we really need here is to tell at link time where the shadow is. AFAICT, there is no such capability currently.
I always wondered if it would be possible to express the shadow mapping as an ELF program header. That would be the ultimate way to communicate shadow memory needs to the kernel.
I'm not sure - I'm just a user who happened to stumble across the bug. You might be able to get them to change where the executable gets mapped, but they could argue that PIE executables should be prepared to be loaded at any address.
What we really need here is to tell at link time where the shadow is.
I don't see how that is possible with PIE / ASLR. The entire point is that you don't know where the executable will be loaded, so you can't know what bits of memory will be free until runtime.
@dvyukov @xairy @ramosian-glider FYI
We could have a program header that means "please reserve the first N bytes of the address space for the application". Then the kernel can use that as a minimum for ELF_ET_DYN_BASE.
@dvyukov can you confirm that the fresh kernel breaks the sanitizers?
I think I am hitting this bug:
$ ./loadaddr
==16572==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==16572==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==16572==Process memory map follows:
0x04daa6e91000-0x04daa6fc6000 /tmp/loadaddr
0x04daa71c6000-0x04daa71c7000 /tmp/loadaddr
0x04daa71c7000-0x04daa71ca000 /tmp/loadaddr
0x04daa71ca000-0x04daa7e2f000
0x7b742c072000-0x7b742c3c4000
0x7b742c3c4000-0x7b742c559000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c559000-0x7b742c759000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c759000-0x7b742c75d000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c75d000-0x7b742c75f000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c75f000-0x7b742c763000
0x7b742c763000-0x7b742c779000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c779000-0x7b742c978000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c978000-0x7b742c979000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c979000-0x7b742c97c000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742c97c000-0x7b742cb7b000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7b000-0x7b742cb7c000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7c000-0x7b742cb7d000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7d000-0x7b742cc8e000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742cc8e000-0x7b742ce8e000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce8e000-0x7b742ce8f000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce8f000-0x7b742ce90000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce90000-0x7b742ce97000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742ce97000-0x7b742d096000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d096000-0x7b742d097000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d097000-0x7b742d098000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d098000-0x7b742d0b1000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d0b1000-0x7b742d2b0000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b0000-0x7b742d2b1000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b1000-0x7b742d2b2000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b2000-0x7b742d2b6000
0x7b742d2b6000-0x7b742d2d9000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4a8000-0x7b742d4bc000
0x7b742d4c0000-0x7b742d4d9000
0x7b742d4d9000-0x7b742d4da000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4da000-0x7b742d4db000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4db000-0x7b742d4dc000
0x7fff06e24000-0x7fff06e46000 [stack]
0x7fff06ef0000-0x7fff06ef2000 [vvar]
0x7fff06ef2000-0x7fff06ef4000 [vdso]
0xffffffffff600000-0xffffffffff601000 [vsyscall]
==16572==End of process memory map.
c-cube:/tmp uname -a
Linux c-cube 4.9.39 #1-NixOS SMP Fri Jul 21 05:42:36 UTC 2017 x86_64 GNU/Linux
(Just compiled a trivial Hello world with -fsanitize=address
)
A possible workaround seems to be the following:
$ .../ld-2.25.so ./loadaddr
That way, loadaddr
will be loaded by ld.so
, which uses mmap
so loadaddr
ends up in the mmap region which is way higher than the PIE base.
(Yes, my ld.so
is in weird path, that's just NixOS things :)
I independently bisected this in the kernel and opened a bug there: https://bugzilla.kernel.org/show_bug.cgi?id=196537 but didn't have a lot of knowledge about the underlying issues.
Bringing this over from twitter (https://twitter.com/kayseesee/status/894594085608013825), my basic view is that this is a bug in the ASAN library code. Assuming you can use a particular virtual address range is not valid (it could already be in use for some reason, as you're now seeing), and even if it were valid, it's not safe for something that can be used in deployment; it exposes potentially sensitive information at an attacker-known address. ASAN simply needs to pay the cost of using a variable address chosen at runtime.
@richfelker ASAN has been using fixed addresses since 2011. I know kernel does not guarantee anything like this, but it worked, and it provided performance and code size benefits over using a dynamic shadow base (which we also have now, as an option, off by default on linux)
ASAN simply needs to pay the cost of using a variable address chosen at runtime.
That's one way to look at it. But a much better resolution would be to have a kernel<=>userspace interface that allows to use a fixed address. And in the meantime, revert the change that broke ASAN.
safe for something that can be used in deployment
If you want to discuss this topic, please open a separate issue, let's not mix too many things in a single place.
Like I said on on the initial Twitter thread, I don't think I have much of value to say beyond "I think what you're doing is badly wrong" and "it happened to work before is not a good argument to do it (or for changing the kernel)". If we disagree then we disagree...
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that.
Is that available in the current stable release of LLVM? And if yes, can you point me to some documentation please.
I think that information would be useful for downstream projects that find the runtime overhead of a dynamic shadow base is acceptable.
And in the meantime, revert the change that broke ASAN.
@kcc I don't think this is good advice. Pretty sure that the change fixes some security issue, so you shouldn't revert that.
I agree strongly with @bennofs. Address assignment/ASLR for production systems should not be tiptoeing around (and possibly impacting security) for the sake of a tool that's only suitable in debugging situations and not production. I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not.
One more discussion thread is here: http://marc.info/?t=149973272100048&r=1&w=2
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that.
In clang there is -mllvm -asan-force-dynamic-shadow=1
, which is the default on Windows.
I don't think this has been implemented in GCC.
This is currently an implementation detail (on windows), not documented.
should not be tiptoeing around
All these arguments are perfectly valid, but who is going to pay for the increased CPU usage and code size? Or, if we end up supporting both configurations on linux (dynamic and static) who is going to pay for the extra maintenance overhead?
We really need to come up with a solution where the application requests a fixed address range at startup and the kernel can't refuse.
@kcc: Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1)
@FSMaxB please open a separate bug with details. But please note: this flag is not officially supported.
Requesting a fixed address range at startup is non-PIE. Normal non-PIE ELF already has a way to do that: PT_LOAD segments (e.g. with PROT_NONE or just BSS you can MAP_FIXED over later). The whole point of an executable being PIE is that it doesn't demand specific addresses.
Being that current kernels don't, and future kernels probably won't, support the invalid usage of assuming a particular fixed address range is free, the fixed address mode should just be removed and dynamic always used. This will simplify the amount of code that needs to be maintained anyway (since Windows already needs dynamic). Performance is not likely to be significantly worse, but ASAN already performs badly and is intended and understood as a costly (but less so than some other approaches) tool for debugging (and possibly in the future, for hardening).
Asan's shadow being at a fixed offset does not really contradict PIE -- the rest of the addresses could be anywhere they want to (except for the shadow region).
BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow.
The view I'm putting forward, which you're free to disagree with but I think is worthwhile, is that the definition of PIE is "no fixed mappings", not "some non-fixed mappings". In this definition, PIE ELF programs can even be loaded in rather esoteric environments like a shared address space (multiple programs in the same process) or a nommu system (where all processes share an address space). There are very good reasons to consider any fixed mappings a design bug; in places where they've been used recently, they've repeatedly come back to bite the designers and users. The Linux/glibc x86_64 "vsyscall" mess, ARM kuserhelper page, etc. come to mind.
BTW my view of these matters is somewhat broader than "Linux" because I'm thinking of/interested in the usage case of non-Linux implementations loading and executing programs using the Linux user-kernel ABI. This sort of generality is part of why I disagree with the view that the kernel is obligated to lay out memory the same way past versions did.
@kcc
BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow.
May make sense to measure sanitized DSOs (where __asan_shadow_memory_dynamic_address
is GOT-relocated), rather than sanitized executables.
@richfelker
I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not.
Relevant discussion in oss-security
I've done an overnight run of SPEC2006 on my machine. The results are surprisingly close. But the run-to-run variation is too high, I'll need to find a less noisy machine.
static dynamic
400.perlbench, 1605.00, 1647.00, 1.03 << dynamic is 3% slower
401.bzip2, 779.00, 797.00, 1.02
403.gcc, 660.00, 686.00, 1.04
429.mcf, 593.00, 503.00, 0.85 << very noisy test
445.gobmk, 960.00, 956.00, 1.00
456.hmmer, 809.00, 812.00, 1.00
458.sjeng, 1214.00, 1227.00, 1.01
462.libquantum, 435.00, 442.00, 1.02
464.h264ref, 1193.00, 1207.00, 1.01
471.omnetpp, 881.00, 904.00, 1.03
473.astar, 704.00, 672.00, 0.95 << dynamic is 5% faster!
483.xalancbmk, 1252.00, 1216.00, 0.97
433.milc, 860.00, 837.00, 0.97
444.namd, 583.00, 590.00, 1.01
447.dealII, 1659.00, 1627.00, 0.98
450.soplex, 454.00, 476.00, 1.05
453.povray, 648.00, 630.00, 0.97
470.lbm, 478.00, 460.00, 0.96
482.sphinx3, 811.00, 798.00, 0.98
I was also surprised to see that the code size with dynamic shadow is actually better (~0.3%). Well, looking at the objdump it makes sense:
Dynamic:
9a8a66: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
Static:
41fd36: 80 b8 00 80 ff 7f 00 cmpb $0x0,0x7fff8000(%rax)
Next steps:
The difference between regular executables and PIE: Regular:
4e7f74: 4c 8b 35 9d 2c 44 00 mov 0x442c9d(%rip),%r14 # 92ac18 <__asan_shadow_memory_dynamic_address>
PIE (or -shared-libasan):
e9504: 48 8d 05 0d 27 44 00 lea 0x44270d(%rip),%rax # 52bc18 <__asan_shadow_memory_dynamic_address>
e950b: 4c 8b 30 mov (%rax),%r14
It looks like the linker is applying relocation relaxation in the PIE/-shared-libasan case, so we end up with a single indirection in the final executable. If you look at the object files you should see two mov instructions.
Are you sure you are linking against the libasan DSO when you build with -shared-libasan? I'd expect to see two movs in the executable unless libasan is being linked statically.
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 && objdump -d a.out | grep "<main>:" -A 6
4e7f74: 4c 8b 35 9d 2c 44 00 mov 0x442c9d(%rip),%r14 # 92ac18 <__asan_shadow_memory_dynamic_address>
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -shared-libasan && objdump -d a.out | grep "<main>:" -A 6
4007a4: 4c 8b 35 b5 08 20 00 mov 0x2008b5(%rip),%r14 # 601060 <__TMC_END__>
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIE -pie && objdump -d a.out | grep "<main>:" -A 6
e9504: 48 8d 05 0d 27 44 00 lea 0x44270d(%rip),%rax # 52bc18 <__asan_shadow_memory_dynamic_address>
e950b: 4c 8b 30 mov (%rax),%r14
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIE -pie -shared-libasan && objdump -d a.out | grep "<main>:" -A 6
984: 48 8b 05 6d 06 20 00 mov 0x20066d(%rip),%rax # 200ff8 <_DYNAMIC+0x258>
98b: 4c 8b 30 mov (%rax),%r14
% ldd a.out | grep asan
libclang_rt.asan-x86_64.so => not found
So, -fPIE -pie -shared-libasan
gives us two loads.
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIC -shared && objdump -d a.out | grep "<main>:" -A 6
874: 48 8b 05 7d 07 20 00 mov 0x20077d(%rip),%rax # 200ff8 <_DYNAMIC+0x218>
87b: 4c 8b 30 mov (%rax),%r14
If we care about ELF + dynamic shadow base, we should duplicate the shadow base global into every DSO. We could add a hidden visibility comdat global with the shadow base to every object file and let the linker merge them. A high priority initializer would set it. This is similar to what we do on Windows.
That seems workable, but before pulling in heavy machinery like that there should be some justification, i.e. a measurement that shows it makes a significant difference. The whole reason we have this problem to begin with is because somebody decided to do a premature optimization with a fixed shadow base address that apparently made virtually no performance difference...
somebody decided
That was me in 2011, and I've made measurements at that time and they were in favor of my decision. Looks like not any more (not 100% confident though, independent evaluation is welcome)
Do we/does it make sense/possible to mark the global with some special attributes so that compiler knows that it never changes in generated code under any circumstances, so that it can freely cache it in a register across functions/calls/loops?
@dvyukov Right now dynamic shadow base is only loaded once per function call. The load (or two loads for DSOs) happen the prologue, and that value is typically allocated to a register live across the whole function. Unfortunately, I think LLVM's rematerialization is primitive. It mostly rematerializes constants.
Do we/does it make sense/possible to mark the global with some special attributes so that compiler knows that it never changes in generated code under any circumstances, so that it can freely cache it in a register across functions/calls/loops?
At least in LLVM you can-- global declarations can be marked const for pretty much this purpose, excerpt from the LLVM LangRef
LLVM explicitly allows declarations of global variables to be marked constant, even if the final definition of the global is not. This capability can be used to enable slightly better optimization of the program, but requires the language definition to guarantee that optimizations based on the ‘constantness’ are valid for the translation units that do not include the definition.
That was me in 2011, and I've made measurements at that time and they were in favor of my decision. Looks like not any more (not 100% confident though, independent evaluation is welcome)
I've never benchmarked ASAN but I've benchmarked thoroughly various shadow-memory systems (taint tracking, etc.) and I can confirm that a constant shadow location is a small but significant optimization. I can search to see if I have any charts handy.
The biggest win IIRC was that a constant address let you be clever about selecting your shadow memory range such that mapping program pointers to their shadow location could be done in fewer instructions (how many depended on the "density" of the mapping, 1:1 or are you bit-packing?).
I was also inlining the runtime, not sure what ASAN does in this regard.
(there are multiple papers about the efficient engineering of these things, FWIW)
ASan's mapping is 8=>1 (no bit packing though, details here: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#mapping
When I last checked a few years ago, the big difference was between using 0
, 0x7fff8000
and something like (1ULL << 43)
.
'0' is the fastest and provides the smallest code but does not work with non-PIE binaries on linux (we use 0 base on Android)
(1ULL << 43)
or some such was used for a while, but then Jakub Jelenek suggested 0x7fff8000
as a compromise between 0
and (1ULL << 43)
. 0x7fff8000
on x86_64 gave us most of the code size and most of the performance of 0
with a much greater compatibility.
Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1)
AFAIK dynamic shadow isn't supported in ASan runtime for Linux (FindAvailableMemoryRange contains UNREACHABLE) so that's expected. Possible implementation would be just to mmap a large chunk for shadow, probably with some hint, in this routine.
check what's going on on ARM (I'll certainly need help with that)
FYI I'm trying to get numbers on my ARM Linux board, but I'll get some results only till the mid of next week (SPEC2006 is very time consuming on my weak ARM board).
In clang there is -mllvm -asan-force-dynamic-shadow=1, which is the default on Windows. I don't think this has been implemented in GCC.
Yes, this is not implemented in GCC, but I don't think it's hard to do (I have a patch that passes GCC ASan bootstrap, but it needs some polishing).
FYI I'm trying to get numbers on my ARM Linux board, but I'll get some results only till the mid of next week (SPEC2006 is very time consuming on my weak ARM board).
So, I've got some numbers on my ARM Linux board. I've used SPEC2006 train size (the board almost died under ref), but even with train noise between test runs was quite low (~1%) for most tests (except perl and hmmer, where noise was ~3%):
Static CFLAGS= -O2 -fPIC -pie -shared-libasan
Dynamic CFLAGS=-O2 -fPIC -mllvm -asan-force-dynamic-shadow=1 -pie -shared-libasan
Processor:
processor : 0
model name : ARMv7 Processor rev 4 (v7l)
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc0f
CPU revision : 4
Test | static | dynamic | dynamic slowdown (less is better) |
---|---|---|---|
400.perlbench | 401 | 413 | 2.9% |
401.bzip2 | 232 | 238 | 2.5% |
429.mcf | 99.5 | 101 | 1.5% |
445.gobmk | 918 | 921 | 0.3% |
456.hmmer | 292 | 300 | 2.7% |
458.sjeng | 1610 | 1622 | 0.7% |
471.omnetpp | 850 | 852 | 0.2% |
473.astar | 415 | 425 | 2.4% |
483.xalancbmk | 774 | 777 | 0.4% |
433.milc | 78.4 | 79.9 | 1.9% |
444.namd | 52.7 | 54.4 | 3.2% |
447.dealII | 175 | 192 | 9.7% |
450.soplex | 38.2 | 38.7 | 1.3% |
453.povray | 78.2 | 81.6 | 4.3% |
470.lbm | 197 | 197 | 0.0% |
Btw, ASan on 32-bit Android maps shadow at 0000 0000 .. 2000 0000, because all executables are PIE, and it is slightly faster that way (and requires less code). This is now broken.
If we care about ELF + dynamic shadow base, we should duplicate the shadow base global into every DSO. We could add a hidden visibility comdat global with the shadow base to every object file and let the linker merge them. A high priority initializer would set it. This is similar to what we do on Windows.
This will not always work. If library A depends on library B, then a constructor of B may call A before A's constructors have ran.
The kernel commit was ultimately reverted. Do we want to keep this issue open?
I don't think it was reverted.
Oh, I think it was reverted in Ubuntu kernel, but not in upstream.
I am writing this for everyone who are trying to find a solution to the problem of running sanitizer on Linux and arrive at this thread from googling. As you might infers from the problem described in this thread, you have to disable ASLR on Linux via "nokaslr" option to be able to run sanitizer, but that put you at a potential security risk, so what I would recommends is to do the followings:
If we need to fix this, I think the best solution would be to use the dynamic shadow offset feature (-mllvm -asan-force-dynamic-shadow=1
) already used on other OSs. My understanding is that the majority of non-Linux platforms (Windows, Android, Mac, iOS) use a dynamic shadow memory base address.
@kcc had concerns in 2017 about the performance of this change. He ran some benchmarks in this comment, and the results were in the noise. If someone can produce new results on a less noisy machine, I don't think there are any other objections. We can make the change and fix this issue for good.
Maybe this interacts with the new ASan codegen that @kda added, I'm not sure.
I don't know if they will ever come around to fixing this issue since this have been around for 5 years, I submitted a workaround for this until then.
Oh, I think it was reverted in Ubuntu kernel, but not in upstream.
It looks like it was reverted upstream in August 2017: https://github.com/torvalds/linux/commit/c715b72c1ba406f133217b509044c38d8e714a37
There was then a minor fix in November 2017 for 5-level-paging (https://github.com/torvalds/linux/commit/be739f4b5ddece74ef25e2304b17a7fd24575e9b), but it has no impact on this issue; that's the last time ELF_ET_DYN_BASE was modified for x64.
This means there is only a very narrow time window from when the breaking change was made (July 2017) and reverted (August 2017); any kernel outside of that 5-week period should be compatible with ASan.
This is just a heads-up about this Linux kernel commit recently committed and pending on a number of stable queues: torvalds/linux@eab09532d40090698b05a07c1c87f39fdbc5fab5
It seems to adjust move the default load address for
-fPIE
executables into the location ASan uses for its shadow memory map (on x86_64). This then causes ASan to abort on startup. Example error:With ASLR enabled, you can sometimes get lucky with the load address and the program runs, but most of the time ASan aborts with this error.
Is it possible for ASan to be a bit more flexible about where it places the shadow map on startup to fix this?