llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28k stars 11.56k forks source link

Segfault: SCEV in Loop Data Prefetch #59310

Open JonPsson opened 1 year ago

JonPsson commented 1 year ago

testcase.tar.gz

$ llc -mcpu=z15 -O3  tc_SCEV_LDP.ll -o /dev/null
...
#255 0x000002aa0058a126 llvm::ScalarEvolution::computeExitLimit(
...

The file has nearly 1000 small loops, so it's not a huge surprise that SCEV could get into trouble. I found by bisecting that before 0b74cb4 "[SCEV] Introduce field for storing SymbolicMaxNotTaken. NFCI", this program terminated normally after 15 seconds, but lately it is instead crashing. I am guessing the ideal behaviour would be to abort the optimization with some kind of limit in SCEV, where "uncomputable" would be returned...

LebedevRI commented 1 year ago
$ time ./bin/llc -mtriple=s390x-linux-gnu -mcpu=z15 -O3 tc_SCEV_LDP.ll -o /dev/null

real    0m8.435s
user    0m8.419s
sys     0m0.016s

Works for me. Please provide better reproduction steps.

JonPsson commented 1 year ago

I tried again on very latest trunk and still see the segfault. I build on SystemZ with

-DCMAKE_BUILD_TYPE="Release" -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_TARGETS_TO_BUILD=SystemZ

git commit 7cf5581

./bin/llc -mtriple=s390x-linux-gnu -O1 tc_SCEV_LDP.ll -o /dev/null >& /dev/null; echo $? Segmentation fault (core dumped) 139

Very interesting if you are not building on SystemZ and only I get this error...(?)

JonPsson commented 1 year ago

What platform are you building on? Do you still not get the segfault..?

LebedevRI commented 1 year ago

This is plain debian gnu/linux, amd64, aa6ea6009fc50b02dbf3788ee9fe605081b154f6

$ ./bin/llc -mtriple=s390x-linux-gnu -O1 /tmp/tc_SCEV_LDP.ll -o /dev/null >& /dev/null; echo $?
0
$ sha512sum /tmp/tc_SCEV_LDP.ll
22502cc50503d8171583e8477597c032183f1682006c138c733f6780cfdc3c7175014c38265b7f996c544ab7dce8f8312ec6b439d7fbfa472c4f567ec08e57ea  /tmp/tc_SCEV_LDP.ll

Please reopen with an actionable testcase.

LebedevRI commented 1 year ago

One thing to note, how does it crash? Is there a stack overflow, and how many frames deep is it? Is there recursion at play?

xortator commented 1 year ago

@JonPsson what kind of build are you using? Is it debug build or release with asserts?

JonPsson commented 1 year ago

It's a stack overflow with +15k frames of SCEV calls. See above for the cmake options I used.

LebedevRI commented 1 year ago

Repro:

$ ulimit -s 128    # !!!
$ ./bin/llc -mtriple=s390x-linux-gnu -mcpu=z15 -O3 /tmp/tc_SCEV_LDP.ll -o /dev/null
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ./bin/llc -mtriple=s390x-linux-gnu -mcpu=z15 -O3 /tmp/tc_SCEV_LDP.ll -o /dev/null
1.      Running pass 'Function Pass Manager' on module '/tmp/tc_SCEV_LDP.ll'.
2.      Running pass 'Loop Data Prefetch' on function '@h'
  #0 0x00007f43dfaa4ff3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /repositories/llvm-project/llvm/lib/Support/Unix/Signals.inc:567:13
  #1 0x00007f43dfaa2e80 llvm::sys::RunSignalHandlers() /repositories/llvm-project/llvm/lib/Support/Signals.cpp:105:18
  #2 0x00007f43dfaa54fa SignalHandler(int) /repositories/llvm-project/llvm/lib/Support/Unix/Signals.inc:412:1
  #3 0x00007f43df65af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
  #4 0x00007f43e07d56f1 llvm::ScalarEvolution::getRangeRef(llvm::SCEV const*, llvm::ScalarEvolution::RangeSignHint, unsigned int) /repositories/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:6512:0
  #5 0x00007f43e07c7e30 llvm::ScalarEvolution::getSignedRange(llvm::SCEV const*) /repositories/llvm-project/llvm/include/llvm/Analysis/ScalarEvolution.h:0:0
LebedevRI commented 1 year ago

I can only repeat what i have said previously. SCEV should not be recursive. getSCEV() is still recursive.

JonPsson commented 1 year ago

It crashed for me on a machine with ulimit -s 8192 (+15k frames). Should the program compile also with your value of 128, which is considerably lower...?

LebedevRI commented 1 year ago

It crashed for me on a machine with ulimit -s 8192 (+15k frames). Should the program compile also with your value of 128, which is considerably lower...?

I do not understand the question. I can't reproduce with normal settings here, but debian is using glibc, which has sane default stack size, so it's not unexpected. To reproduce, one needs to lower the stack size before running the reproducer.

JonPsson commented 1 year ago

With a value of 128 I can't even link llc, so it's not a big surprise with a crash then...

JonPsson commented 1 year ago

I have looked into this a bit more, and it seems that I got the crash on a machine where -fno-semantic-interposition was not used. So I think that if you build llc with

-DCMAKE_C_FLAGS_RELEASE="-fsemantic-interposition" -DCMAKE_CXX_FLAGS_RELEASE="-fsemantic-interposition"

you will also see the segfault...

xortator commented 1 year ago

I doubt it was genuinely caused by my patch, but maybe there is some old bug that was triggered by this change. Likely the very fact of constructing giant SCEVs here is the issue. We'll see into it.

JonPsson commented 1 year ago

I never thought so either - I think I have seen this problem before. I was merely hoping you might take a look as I saw you have been working on it and may be familiar with it... thanks.