llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.68k stars 11.86k forks source link

fuzz clang-format #23426

Open kcc opened 9 years ago

kcc commented 9 years ago


Bugzilla Link	23052
Version	unspecified
OS	Linux
CC	@d0k,@KernelAddress

Extended Description

We have a fuzzer of clang-format in the source tree. Details: llvm/lib/Fuzzer/README.txt

It has found a few bugs so far: r226685, r226678, r226451, r226446, r226448, r227427, r226447, r226685, r226680, r226698, r229485, r227677, r227433, r227427, r230395, r231066, (probably missed a couple more)

There are a few remaining, we will be posting them here, one per comment.

There is also a build bot which runs the fuzzer 24/7 and will report new bugs (regressions) if they appear or old bugs if the fuzzer discovers them. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer

kcc commented 8 years ago

The clang-format-fuzzer bot has been mostly green lately, with only one periodic assert failure, bug 26032 I've changed the bot to treat clang-format-fuzzer failures as real ones, not just warnings.

llvmbot commented 9 years ago

Fixed crasher in r242738.

kcc commented 9 years ago

Daniel, many thanks for the fixes. The next biggest offender is

clang-format-fuzzer: /mnt/b/sanitizer-buildbot5/sanitizer-x86_64-linux-fuzzer/build/llvm/tools/clang/lib/Format/ContinuationIndenter.cpp:1066: unsigned int clang::format::ContinuationIndenter::breakProtrudingToken(const clang::format::FormatToken &, clang::format::LineState &, bool): Assertion `NewRemainingTokenColumns < RemainingTokenColumns' failed.

reproducer (base64-encoded): SCQhJCwxLGNvbnN0ZQx4ciBjaHIzaDJ0IDMqMiAjJCgpABkMLTo9IGdldCxRKiJzdFwwXPSKpKQ6JFxcIg==

You may get more reproducers from the bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer

kcc commented 9 years ago

the clang/clang-format fuzzer bot lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer has been extended to run both with and w/o assertions. whenever a bug is found, the fuzzer will print the base64-encoded reproducer so that one can copy-paste it from the buildbot logs: E.g. from the bot logs:

SUMMARY: AddressSanitizer: ... CRASHED; file written to crash-80193815206841682354717562770799349303 Base64: OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs=

Just do this: echo OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs= | base64 -d | clang -x c++ -

kcc commented 9 years ago

Bug llvm/llvm-project#23294 has been marked as a duplicate of this bug.

kcc commented 9 years ago

echo PCo+Iis/J2FjIDpTDT46zvxcXAp1NzI49zxGPg== | base64 --decode | clang-format -

Assertion `EndColumn >= StartColumn' failed.

kcc commented 9 years ago

echo LypcAAov | base64 --decode | clang-format -

Assertion `TokenText.startswith("/") && TokenText.endswith("/")' failed.

d0k commented 9 years ago

A chain of < seems to trigger superlinear runtime in the parser.

perl -e 'print "<" x 20'|clang-format

n | seconds 20 | 0.101 21 | 0.191 22 | 0.367 23 | 0.722 24 | 1.431 25 | 2.730 26 | 5.173 27 | 10.026 28 | 19.779 29 | 39.350

kcc commented 9 years ago

This one is worse: 31 seconds w/o instrumentation for 64 bytes, same profile.

cat << EOF | base64 --decode | clang-format PDw8SAQEMigqLCioKDFoLGgKPDw8PDw8CjwKPDw8PEhoCjw8PBw8PDwoKiwoqCJoLGgKKAoKPDw8 Cjw8PDw8PA== EOF

kcc commented 9 years ago

Clang-format(-fuzzer) is very slow on a tiny input. May not be a big problem by itself (or may be it is), but this hurts fuzzing very much. With all the fuzzer instrumentation it takes ~1.5 second to format 60 bytes. W.o. instrumentation it takes ~0.5 second.

cat << EOF | base64 --decode | clang-format PDw8SAQEMigqLCioKjFoLGgKPDw8PDw8Cjw8PCxkKiQcPDw8KCosKKgiaCxoCigKCjw8PAo8PGQq KKA6 EOF

Perf: 51.83% clang::format::(anonymous namespace)::AnnotatingParser::next()
13.12% clang::format::(anonymous namespace)::AnnotatingParser::parseParens(bool)
11.87% clang::format::(anonymous namespace)::AnnotatingParser::consumeToken()
8.32% clang::format::(anonymous namespace)::AnnotatingParser::parseAngle()
5.01% clang::getBinOpPrecedence(clang::tok::TokenKind, bool, bool)
4.90% clang::format::(anonymous namespace)::AnnotatingParser::updateParameterCount(clang::format::FormatToken*, clang::format::Format 2.27% clang::format::FormatToken::isSimpleTypeSpecifier() const