Closed chandlerc closed 1 year ago
for whom asan will stopp working (due to incompatibility with use-after-return checking) Here i'm not following. This is about turning use-after-return checking on by default, so you mean "(due to some code out there being incompatible with use-after-return checking)"?
Yes. Suppose an existing user of asan updates to the new version of LLVM that now has use-after-return enabled. The code that used to run fine under asan will not fail because
It's still totally worth the risk.
You mean the time it takes to build stage-2 ? With which build mode? Release?
Oh, no, I mean our (human) time to make the testing. Ah!
I would make the change, build LLVM/Release with asan, run the tests, submit, look carefully at the bots. That part is obvious.
We will also get an inflow of upset users
for whom asan will stopp working (due to incompatibility with use-after-return checking) Here i'm not following. This is about turning use-after-return checking on by default, so you mean "(due to some code out there being incompatible with use-after-return checking)"?
or became slow, so we need to budget for that as well.
You mean the time it takes to build stage-2 ? With which build mode? Release?
Oh, no, I mean our (human) time to make the testing. I would make the change, build LLVM/Release with asan, run the tests, submit, look carefully at the bots.
We will also get an inflow of upset users for whom asan will stopp working (due to incompatibility with use-after-return checking) or became slow, so we need to budget for that as well.
You mean the time it takes to build stage-2 ? With which build mode? Release?
Not sure if we can invest time into this right now. Would you like to try? We need to make sure that asan build of LLVM doesn't regress too much.
If the intent is to enable it, it might be better to do it rather sooner than later, to have more room for reverts, and not let it slip into next release++
Sure. I need investigate some performance issues. I plan to do so in the next couple of month. It can be enabled with or without investigation, but if we find easy fix for issues then less users will need to disable the check.
Vitaly, WDYT? I guess we may try turning this on by default
Any changes/roadmap/plan here? https://github.com/google/sanitizers/issues/217 appears to be fixed, and nothing else is referenced.
Just wasted an hour debugging quite simple code until realizing that stack-use-after-return detection is still not on by default.
This is at -O1: pure asan vs asan+UAR but with UAR disabled at run-time i.e. we are measuring the overhead of instrumentation w/o actually using the fake stack.
400.perlbench, 1364.00, 1383.00, 1.01
401.bzip2, 876.00, 877.00, 1.00
403.gcc, 676.00, 675.00, 1.00
429.mcf, 673.00, 674.00, 1.00
445.gobmk, 909.00, 946.00, 1.04 <<
456.hmmer, 970.00, 978.00, 1.01
458.sjeng, 1138.00, 1139.00, 1.00
462.libquantum, 676.00, 648.00, 0.96
464.h264ref, 1404.00, 1378.00, 0.98
471.omnetpp, 717.00, 726.00, 1.01
473.astar, 785.00, 799.00, 1.02
483.xalancbmk, 1145.00, 1209.00, 1.06 <<
433.milc, 742.00, 736.00, 0.99
444.namd, 613.00, 615.00, 1.00
447.dealII, 1493.00, 1519.00, 1.02
450.soplex, 519.00, 525.00, 1.01
453.povray, 524.00, 573.00, 1.09 <<
470.lbm, 492.00, 501.00, 1.02
482.sphinx3, 1035.00, 1052.00, 1.02
Quite tolerable.
With -O1, the UAR slowdown is a bit worse. this is quite expected: more stack objects survive optimizations 400.perlbench, 1393.00, 1690.00, 1.21 <<< 401.bzip2, 891.00, 944.00, 1.06 403.gcc, 659.00, 694.00, 1.05 429.mcf, 674.00, 692.00, 1.03 445.gobmk, 922.00, 1134.00, 1.23 <<< 456.hmmer, 975.00, 980.00, 1.01 458.sjeng, 1133.00, 1441.00, 1.27 <<< 462.libquantum, 576.00, 539.00, 0.94 464.h264ref, 1424.00, 1411.00, 0.99 471.omnetpp, 752.00, 700.00, 0.93 473.astar, 845.00, 884.00, 1.05 483.xalancbmk, 1198.00, 1975.00, 1.65 <<< 433.milc, 729.00, 716.00, 0.98 444.namd, 615.00, 614.00, 1.00 447.dealII, 1503.00, 2028.00, 1.35 <<< 450.soplex, 522.00, 513.00, 0.98 453.povray, 518.00, 1191.00, 2.30 <<< 470.lbm, 390.00, 401.00, 1.03 482.sphinx3, 1031.00, 982.00, 0.95
r191186 enables the compile-time instrumentation for UAR detection by default when asan is aneabled, while keeping the run-time flag off by default. Now one needs to pass ASAN_OPTIONS=detect_stack_use_after_return=1 at run-time to enable UAR detection
I've completely re-implemented the use-after-return allocator, it is now supposed to be faster and async-signal-safe(er). Preliminary results on SPEC look better than before: 400.perlbench, 1353.00, 1695.00, 1.25 445.gobmk, 889.00, 1123.00, 1.26 458.sjeng, 1036.00, 1283.00, 1.24 483.xalancbmk, 496.00, 585.00, 1.18 453.povray, 441.00, 826.00, 1.87 Testing it further.
Another issue which I understood just now is that the current implementation is not signal-safe. This will be tracked separately in https://code.google.com/p/address-sanitizer/issues/detail?id=217
483.xalancbmk, 466.00, 786.00, 1.69
This benchmark has insanely deep recursion (depth is over 30K).
This is the current performance data: asan vs asan+UAR at -O2 483.xalancbmk will need a re-run (UAR failed due to too deep recursion)
458.sjeng is known to have a large stack array in the hotspot. UAR mode has to poison the entire thing, which is costly.
Look at e.g. this function: int search (int alpha, int beta, int depth, int is_null) { move_s moves[MOVE_BUFF]; // MOVE_BUFF == 512 int num_moves, i, j; int score = -INF, move_ordering[MOVE_BUFF], see_values[MOVE_BUFF];
This function is recursive, has 500 LOCs and lots of calls. I don't think it is possible to optimize UAR detection away w/o whole program mode.
I haven't analyzed other cases yet.
400.perlbench, 1328.00, 2087.00, 1.57 <<<<<<<<<
401.bzip2, 847.00, 856.00, 1.01
403.gcc, 618.00, 666.00, 1.08
429.mcf, 581.00, 579.00, 1.00
445.gobmk, 867.00, 1404.00, 1.62 <<<<<<<<<
456.hmmer, 912.00, 894.00, 0.98
458.sjeng, 1016.00, 1365.00, 1.34 <<<<<<<<<
462.libquantum, 535.00, 545.00, 1.02
464.h264ref, 1274.00, 1346.00, 1.06
471.omnetpp, 563.00, 574.00, 1.02
473.astar, 642.00, 640.00, 1.00
483.xalancbmk, 466.00, -1.00, -0.00
433.milc, 649.00, 653.00, 1.01
444.namd, 596.00, 595.00, 1.00
447.dealII, 620.00, 650.00, 1.05
450.soplex, 361.00, 366.00, 1.01
453.povray, 422.00, 1986.00, 4.71 <<<<<<<<<
470.lbm, 377.00, 380.00, 1.01
482.sphinx3, 956.00, 975.00, 1.02
I'd love this to happen, but I am not sure if this will ever be possible due to large and hard-to-predict extra overhead, both in CPU and RAM. Here is a rough description of how the UAR detection works: https://code.google.com/p/address-sanitizer/wiki/UseAfterReturn
What's "capture analysis"?
It's a subset of escape. Essentially, it is escape which outlives the function as opposed to escape which merely hands the address to unknown other code.
Nick is the expert here, but I'm also happy to explain in more detail... IRC might work better than a bug though.
The nice thing is that LLVM will aggressively track capture working bottom up across the call graph. So it should be possible to distinguish between a stack variable which is passed down to other functions and one which might have its address outlive the function body.
See include/llvm/Analysis/CaptureTracking.h for the interface to leverage this analysis in LLVM.
There also might be other ways to improve performance of this, for example by using something similar to a cactus stack rather than just heap allocations. Anyways, I would want to understand exactly what the performance impact was and where it came from. That was the point of filing a bug to track it.
If we could use LLVM analysis to prove that
arr
does not escape, this indeed would help.
Exactly. This is what 'nocapture' can help with. Try out the analysis pass I mentioned maybe? Should be easy to prototype, although it may need tuning to avoid compile time problems.
Another problem is compatibility. The UAR mode is less compatible with an arbitrary code than the default ASAN. For example, instrumenting V8 makes it reports false positives on Chromium (the V8 code uses some assumptions about the stack layout).
I'm pretty happy to start breaking code that makes assumptions about the stack layout. This needs to happen anyways IMO, or at the very least such code needs to explicitly disclaim that it is assuming things about the layout of the stack.
I'd love this to happen, but I am not sure if this will ever be possible due to large and hard-to-predict extra overhead, both in CPU and RAM. Here is a rough description of how the UAR detection works: https://code.google.com/p/address-sanitizer/wiki/UseAfterReturn
What's "capture analysis"?
In most cases, if the stack frame (local variable) survives after all LLVM optimizations, it means that some of the variables potentially escape.
The exception is code like this:
void foo(...) {
// arr
provably does not escape foo() because its address is never taken
int arr[100];
... arr[i] ...
}
If we could use LLVM analysis to prove that arr
does not escape, this indeed
would help.
Another problem is compatibility. The UAR mode is less compatible with an arbitrary code than the default ASAN. For example, instrumenting V8 makes it reports false positives on Chromium (the V8 code uses some assumptions about the stack layout).
assigned to @vitalybuka
Fixed with 4b4437c084e2b8a2643e97e7aef125c438635a4d
Extended Description
Filing this as a tracking bug to track progress on turning ASan's use-after-return checking on by default.
The reason I want this on by default is that there is very little to indicate to a user that a bug in their code is related to use-after-return, and so it seems especially valuable to eagerly diagnose it. Also, it seems likely that by applying LLVM's capture analysis to the instrumentation the overhead associated with this mode could be significantly reduced.