llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.2k stars 11.14k forks source link

[ASan] fails to detect OOB access due to builtin optimization #63886

Open N-R-K opened 1 year ago

N-R-K commented 1 year ago

Minimal reproducible code-snippet:

#include <string.h>

int main(void)
{
    char s[1] = "A";
    return strlen(s);
}

Compile command: clang -O1 -g3 -fsanitize=address,undefined Expected behavior: ASan should catch the OOB access. Reality: It gets optimized out and returns 1 always. Note: adding -fno-builtin or -O0 allows ASan to be effective.

shafik commented 1 year ago

Maybe related: https://github.com/llvm/llvm-project/issues/31415

N-R-K commented 1 year ago

Maybe related: https://github.com/llvm/llvm-project/issues/31415

I don't believe so.

In this report, strlen is being called with a non-string (i,e something not nul-terminated) but it's being incorrectly optimized out anyways (preventing ASan from having a chance at catching the OOB read).

The other report, strlen is being called validly - and the issue was with non-portable constant expression.

AdvenamTacet commented 1 year ago

It is different, but seems related. An explanation for those confused as me at the beginning, this initialization is incorrect in C++ and it's detected during compilation,l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),header:(),k:59.669376437589314,l:'4',m:46.878885653576766,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang1600,deviceViewOpen:'1',filters:(b:'1',binary:'1',binaryObject:'1',commentOnly:'1',debugCalls:'1',demangle:'0',directives:'1',execute:'0',intel:'1',libraryCode:'1',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O1+-g3+-fsanitize%3Daddress,undefined',overrides:!(),selection:(endColumn:45,endLineNumber:3,positionColumn:45,positionLineNumber:3,selectionStartColumn:45,selectionStartLineNumber:3,startColumn:45,startLineNumber:3),source:1),l:'5',n:'0',o:'+x86-64+clang+16.0.0+(Editor+%231)',t:'0')),header:(),k:40.330623562410686,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:77.50177430801988,n:'0',o:'',t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+16.0.0',editorid:1,fontScale:14,fontUsePx:'0',j:1,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+16.0.0+(Compiler+%231)',t:'0')),l:'4',m:22.498225691980124,n:'0',o:'',s:0,t:'0')),k:100.00000000000003,l:'3',n:'0',o:'',t:'0')),version:4).

However, this initialization is correct in C and the string is not null terminated.

An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array C11 N1570

Therefore OOB read happens inside strlen. That is detected, if we play with the pointer or turn off optimizations.

I think it's UB, so value 1 from strlen is as good as any other, but I agree that it should not be optimized-out when compiling with ASan and therefore detected.

N-R-K commented 1 year ago

this initialization is incorrect in C++ and it's detected during compilation.

The underlying issue still exists on C++ as well, you can change the initialization to char s[1] = {'A'}; for example.

I agree that it should not be optimized-out when compiling with ASan and therefore detected.

Agreed. However I'd go one step further and say that the "right thing" to do would be to emit a warning as well since this is a case where the compiler can easily see and/or prove the code to be unsound.

AdvenamTacet commented 1 year ago

this initialization is incorrect in C++ and it's detected during compilation.

The underlying issue still exists on C++ as well, you can change the initialization to char s[1] = {'A'}; for example.

Thanks for pointing that out!

Agreed. However I'd go one step further and say that the "right thing" to do would be to emit a warning as well since this is a case where the compiler can easily see and/or prove the code to be unsound.

Are you thinking about detecting OOB in the strlen during compilation? Then compilation error may be right and we don't have to think about turning off the optimization. Or are you thinking to warn about char s[1] = "A"; in C?

N-R-K commented 1 year ago

Are you thinking about detecting OOB in the strlen during compilation?

Yup. Since clang is already inspecting/simulating the strlen call for optimization purposes, it shouldn't be impossible to also check to verify the existence of a nul-terminator - and raise a warning (and also abort the optimization? which would fix the ASan issue) if it wasn't found.

Or are you thinking to warn about char s[1] = "A"; in C?

Now that you mention it, there are cases where that'd be useful (mainly when using a flat char [][] to avoid pointer overhead), but that's probably a separate topic.

AdvenamTacet commented 12 months ago

I don't have any experience with that part of LLVM, so please correct me if I'm wrong. I tried to fix it, but without a success.

The issue is easily reproducible for buffers up to 32bytes. I believe bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result, EvalInfo &Info) is used to calculate length in cases above. Slow path is executed as it's not a string literal/cannot be dynamically casted to one.

  // Slow path: scan the bytes of the string looking for the terminating 0.
  for (uint64_t Strlen = 0; /**/; ++Strlen) {
    APValue Char;
    if (!handleLValueToRValueConversion(Info, E, CharTy, String, Char) ||
        !Char.isInt())
      return false;
    if (!Char.getInt()) {
      Result = Strlen;
      return true;
    }
    if (!HandleLValueArrayAdjustment(Info, E, String, CharTy, 1))
      return false;
  }

https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L16401-L16413

I think the null byte is added during pointer evaluation (!EvaluatePointer(E, String, Info)). I tried to compare buffer size with a calculated size, but it didn't work.

if(CharUnits::fromQuantity(Strlen) == Info.Ctx.getTypeSizeInChars(CharTy))
        return false;
 #10 319.7 /work/llvm-stage2/include/c++/v1/__format/formatter_integral.h:406:33: error: constexpr variable '__true' must be initialized by a constant expression
#10 319.7   406 |   static constexpr wstring_view __true{L"true"};
#10 319.7       |                                 ^~~~~~~~~~~~~~~

It deserves to be fixed. At least a warning should be printed (may be error, tbh). Although not calculating the value in compile time and leaving it for runtime would make AddressSanitizer work, I do not prefer this solution. We should report a buffer overflow during compilation, as we can do so. I am happy to test it or implement it. However, I would appreciate a hint, as I do not want to dive too deep into the AST implementation.

CC: @AaronBallman I believe it's part of the codebase you are taking care of.

AaronBallman commented 12 months ago

We have some fortification checks already, and I think strlen should be added to what we already handle. I'd recommend looking at the implementation of Sema::checkFortifiedBuiltinMemoryFunction() as that's roughly where I'd expect the logic to live.