llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.34k stars 11.7k forks source link

InstCombine strlen(x)==0 folding removes ASan heap-buffer-overflow failure #55525

Open Teemperor opened 2 years ago

Teemperor commented 2 years ago

I have the following C++20 code containing a strlen call on a buffer without null terminator:

#include <cstring>
#include <string_view>
#include <cstdlib>

char getFirstChar(std::string_view s) {
    if (s.empty())
      return ' ';
    return s.front();
}

int main(int argc, char **argv) {
  char *arg = argv[0];
  char *x = (char*)calloc(1, std::strlen(arg)); // This is not allocating the 0 terminator.
  memcpy(x, arg, strlen(arg));
  char res = getFirstChar(x);
  free(x);
  return res;
}

https://godbolt.org/z/GjvG537oG

This code above fails with an ASan report as expected on O0 with -fsanitize=address, but on O1 (which e.g. is what oss-fuzz is always using for fuzzing) this code passes with ASan enabled.

A minimal reproducer is:

#include <string.h>

int main() {
    char x[2] = {'a', 'b'};
    return strlen(x) == 0;
}

https://godbolt.org/z/ea8zK93v6

The problem here seems to be in LibCallSimplifier::optimizeStringLength which folds strlen(x)==0 to a load. https://github.com/llvm/llvm-project/blob/6c81079edf26da23f977a82e82f72cc6abd9cafd/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp#L640

davidbolvansky commented 2 years ago

This is a valid optimization. Anyway, running sanitizers with optimizers is not ideal at all, as some bugs could be "hidden" by optimizers (this case).

'-fno-builtin' or '-fno-builtin-strlen' could work as a workaround.

Teemperor commented 2 years ago

This is a valid optimization.

Yes, I agree that the optimization itself should stay and is valid. This isn't really either an ASan bug nor an optimizer bug as both behave as expected. The bug report (and some other bug reports I have lined up) is more intended in a "we probably should disable some optimizations if ASan is enabled"-way. There is an upcoming RFC about how I think these issues can be fixed in a way that isn't adding too much complexity to LLVM (and disabling builtins is part of the solution I think). The bug reports are mostly intended as a resource for that RFC.

Side note: Right now the used optimization level for nearly all ASan users (fuzzing, LLVM's own test bots, Chrome, ...) is O1 or higher (which is suggested by the ASan documentation, quote: "To get a reasonable performance add -O1 or higher. ").

davidbolvansky commented 2 years ago

gcc has the interesting attribute 'nonstring' https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/Common-Variable-Attributes.html

which could be added to char x[2] = {'a', 'b'}; and if strlen is called with x (which is annotated with nonstring), this could be diagnosed more easily..

maybe it would be interesting to also provide attribute 'zero_terminated' or something like that to mark C libcalls with this attribute and also expose it by C/C++ frontend could mark own functions with it to help sanitizers and optimizers.

in gcc implemented by @msebor : https://github.com/gcc-mirror/gcc/commit/6a33d0ff21e941fc3a65f23a753cc318aaae82b5

fhahn commented 2 years ago

Probably the same fundamental issue as for #55595 #55571