Open N-R-K opened 1 year ago
Maybe related: https://github.com/llvm/llvm-project/issues/31415
Maybe related: https://github.com/llvm/llvm-project/issues/31415
I don't believe so.
In this report, strlen
is being called with a non-string (i,e something not nul-terminated) but it's being incorrectly optimized out anyways (preventing ASan from having a chance at catching the OOB read).
The other report, strlen is being called validly - and the issue was with non-portable constant expression.
It is different, but seems related. An explanation for those confused as me at the beginning, this initialization is incorrect in C++ and it's detected during compilation,l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),header:(),k:59.669376437589314,l:'4',m:46.878885653576766,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang1600,deviceViewOpen:'1',filters:(b:'1',binary:'1',binaryObject:'1',commentOnly:'1',debugCalls:'1',demangle:'0',directives:'1',execute:'0',intel:'1',libraryCode:'1',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O1+-g3+-fsanitize%3Daddress,undefined',overrides:!(),selection:(endColumn:45,endLineNumber:3,positionColumn:45,positionLineNumber:3,selectionStartColumn:45,selectionStartLineNumber:3,startColumn:45,startLineNumber:3),source:1),l:'5',n:'0',o:'+x86-64+clang+16.0.0+(Editor+%231)',t:'0')),header:(),k:40.330623562410686,l:'4',m:100,n:'0',o:'',s:0,t:'0')),l:'2',m:77.50177430801988,n:'0',o:'',t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+16.0.0',editorid:1,fontScale:14,fontUsePx:'0',j:1,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+16.0.0+(Compiler+%231)',t:'0')),l:'4',m:22.498225691980124,n:'0',o:'',s:0,t:'0')),k:100.00000000000003,l:'3',n:'0',o:'',t:'0')),version:4).
However, this initialization is correct in C and the string is not null terminated.
An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array C11 N1570
Therefore OOB read happens inside strlen
. That is detected, if we play with the pointer or turn off optimizations.
I think it's UB, so value 1 from strlen
is as good as any other, but I agree that it should not be optimized-out when compiling with ASan and therefore detected.
this initialization is incorrect in C++ and it's detected during compilation.
The underlying issue still exists on C++ as well, you can change the initialization to char s[1] = {'A'};
for example.
I agree that it should not be optimized-out when compiling with ASan and therefore detected.
Agreed. However I'd go one step further and say that the "right thing" to do would be to emit a warning as well since this is a case where the compiler can easily see and/or prove the code to be unsound.
this initialization is incorrect in C++ and it's detected during compilation.
The underlying issue still exists on C++ as well, you can change the initialization to
char s[1] = {'A'};
for example.
Thanks for pointing that out!
Agreed. However I'd go one step further and say that the "right thing" to do would be to emit a warning as well since this is a case where the compiler can easily see and/or prove the code to be unsound.
Are you thinking about detecting OOB in the strlen
during compilation? Then compilation error may be right and we don't have to think about turning off the optimization. Or are you thinking to warn about char s[1] = "A";
in C?
Are you thinking about detecting OOB in the
strlen
during compilation?
Yup. Since clang is already inspecting/simulating the strlen
call for optimization purposes, it shouldn't be impossible to also check to verify the existence of a nul-terminator - and raise a warning (and also abort the optimization? which would fix the ASan issue) if it wasn't found.
Or are you thinking to warn about
char s[1] = "A";
in C?
Now that you mention it, there are cases where that'd be useful (mainly when using a flat char [][]
to avoid pointer overhead), but that's probably a separate topic.
I don't have any experience with that part of LLVM, so please correct me if I'm wrong. I tried to fix it, but without a success.
The issue is easily reproducible for buffers up to 32bytes.
I believe bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result, EvalInfo &Info)
is used to calculate length in cases above. Slow path is executed as it's not a string literal/cannot be dynamically casted to one.
// Slow path: scan the bytes of the string looking for the terminating 0.
for (uint64_t Strlen = 0; /**/; ++Strlen) {
APValue Char;
if (!handleLValueToRValueConversion(Info, E, CharTy, String, Char) ||
!Char.isInt())
return false;
if (!Char.getInt()) {
Result = Strlen;
return true;
}
if (!HandleLValueArrayAdjustment(Info, E, String, CharTy, 1))
return false;
}
https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L16401-L16413
I think the null byte is added during pointer evaluation (!EvaluatePointer(E, String, Info)
). I tried to compare buffer size with a calculated size, but it didn't work.
if(CharUnits::fromQuantity(Strlen) == Info.Ctx.getTypeSizeInChars(CharTy))
return false;
#10 319.7 /work/llvm-stage2/include/c++/v1/__format/formatter_integral.h:406:33: error: constexpr variable '__true' must be initialized by a constant expression
#10 319.7 406 | static constexpr wstring_view __true{L"true"};
#10 319.7 | ^~~~~~~~~~~~~~~
It deserves to be fixed. At least a warning should be printed (may be error, tbh). Although not calculating the value in compile time and leaving it for runtime would make AddressSanitizer work, I do not prefer this solution. We should report a buffer overflow during compilation, as we can do so. I am happy to test it or implement it. However, I would appreciate a hint, as I do not want to dive too deep into the AST implementation.
CC: @AaronBallman I believe it's part of the codebase you are taking care of.
We have some fortification checks already, and I think strlen should be added to what we already handle. I'd recommend looking at the implementation of Sema::checkFortifiedBuiltinMemoryFunction()
as that's roughly where I'd expect the logic to live.
Minimal reproducible code-snippet:
Compile command:
clang -O1 -g3 -fsanitize=address,undefined
Expected behavior: ASan should catch the OOB access. Reality: It gets optimized out and returns 1 always. Note: adding-fno-builtin
or-O0
allows ASan to be effective.