wsmoses / Tapir-LLVM

Tapir extension to LLVM for optimizing Parallel Programs
Other
133 stars 24 forks source link

Assertion `Returns.empty() && "Returns cloned when cloning loop."' failed. #32

Open stelleg opened 6 years ago

stelleg commented 6 years ago

When building the cilk version of a simple finite elements application: https://mantevo.org/downloads/miniFE_2.0.1.html

Have done a small amount of digging, but thought I'd submit an issue in case the fix is obvious to someone else. I'll keep digging in the meantime.

Error output:

In file included from main.cpp:47:
./driver.hpp:125:1: warning: Tapir loop not transformed: failed to use divide-and-conquer loop spawning [-Wpass-failed=loop-spawning]
driver(const Box& global_box, Box& my_box,
^
./driver.hpp:125:1: warning: Tapir loop not transformed: failed to use divide-and-conquer loop spawning [-Wpass-failed=loop-spawning]
clang-5.0: /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1226: virtual bool {anonymous}::DACLoopSpawning::processLoop(): Assertion `Returns.empty() && "Returns cloned when cloning loop."' failed.
#0 0x0000560f2563968f llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:398:0
#1 0x0000560f25639722 PrintStackTraceSignalHandler(void*) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:462:0
#2 0x0000560f25637955 llvm::sys::RunSignalHandlers() /home/george/tasks/tapir/parallel-ir/lib/Support/Signals.cpp:49:0
#3 0x0000560f25638efb SignalHandler(int) /home/george/tasks/tapir/parallel-ir/lib/Support/Unix/Signals.inc:252:0
#4 0x00007fd26b932da0 __restore_rt (/usr/lib/libpthread.so.0+0x11da0)
#5 0x00007fd26a464860 __GI_raise (/usr/lib/libc.so.6+0x34860)
#6 0x00007fd26a465ec9 __GI_abort (/usr/lib/libc.so.6+0x35ec9)
#7 0x00007fd26a45d0bc __assert_fail_base (/usr/lib/libc.so.6+0x2d0bc)
#8 0x00007fd26a45d133 (/usr/lib/libc.so.6+0x2d133)
#9 0x0000560f26698ab3 (anonymous namespace)::DACLoopSpawning::processLoop() /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1226:0
#10 0x0000560f2669b598 (anonymous namespace)::LoopSpawningImpl::processLoop(llvm::Loop*) /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1686:0
#11 0x0000560f2669adf3 (anonymous namespace)::LoopSpawningImpl::run() /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1627:0
#12 0x0000560f2669bda3 (anonymous namespace)::LoopSpawning::runOnFunction(llvm::Function&) /home/george/tasks/tapir/parallel-ir/lib/Transforms/Tapir/LoopSpawning.cpp:1818:0
#13 0x0000560f24f1f99a llvm::FPPassManager::runOnFunction(llvm::Function&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1514:0
#14 0x0000560f24f1fb3f llvm::FPPassManager::runOnModule(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1535:0
#15 0x0000560f24f1fec7 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1591:0
#16 0x0000560f24f205f1 llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1694:0
#17 0x0000560f24f207e9 llvm::legacy::PassManager::run(llvm::Module&) /home/george/tasks/tapir/parallel-ir/lib/IR/LegacyPassManager.cpp:1726:0
#18 0x0000560f258e5b09 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/BackendUtil.cpp:842:0
#19 0x0000560f258e7d2a clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/BackendUtil.cpp:1192:0
#20 0x0000560f264369c3 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/CodeGenAction.cpp:261:0
#21 0x0000560f271abe80 clang::ParseAST(clang::Sema&, bool, bool) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Parse/ParseAST.cpp:161:0
#22 0x0000560f25f3a6fb clang::ASTFrontendAction::ExecuteAction() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/FrontendAction.cpp:1003:0
#23 0x0000560f26434784 clang::CodeGenAction::ExecuteAction() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/CodeGen/CodeGenAction.cpp:993:0
#24 0x0000560f25f3a13e clang::FrontendAction::Execute() /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/FrontendAction.cpp:906:0
#25 0x0000560f25ed6e38 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/Frontend/CompilerInstance.cpp:981:0
#26 0x0000560f260850f8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/george/tasks/tapir/parallel-ir/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:251:0
#27 0x0000560f2358dca3 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/cc1_main.cpp:221:0
#28 0x0000560f23582e5c ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/driver.cpp:306:0
#29 0x0000560f23583a38 main /home/george/tasks/tapir/parallel-ir/tools/clang/tools/driver/driver.cpp:387:0
#30 0x00007fd26a450f4a __libc_start_main (/usr/lib/libc.so.6+0x20f4a)
#31 0x0000560f2358053a _start (/home/george/tasks/tapir/build/bin/clang-5.0+0x1b6a53a)
Stack dump:
0.  Program arguments: /home/george/tasks/tapir/build/bin/clang-5.0 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -main-file-name main.cpp -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -momit-leaf-frame-pointer -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/george/tasks/mantevo/miniFE/miniFE-2.0_cilk/src/main.gcno -resource-dir /home/george/tasks/tapir/build/lib/clang/5.0.0 -I . -I ../utils -I ../fem -D MINIFE_SCALAR=double -D MINIFE_LOCAL_ORDINAL=int -D MINIFE_GLOBAL_ORDINAL=int -D MINIFE_CSR_MATRIX -D MINIFE_INFO=1 -D MINIFE_KERNELS=0 -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1 -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1/x86_64-pc-linux-gnu -internal-isystem /usr/lib64/gcc/x86_64-pc-linux-gnu/7.2.1/../../../../include/c++/7.2.1/backward -internal-isystem /usr/local/include -internal-isystem /home/george/tasks/tapir/build/lib/clang/5.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -fdeprecated-macro -fdebug-compilation-dir /home/george/tasks/mantevo/miniFE/miniFE-2.0_cilk/src -ferror-limit 19 -fmessage-length 138 -ftapir=cilk -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o main.o -x c++ main.cpp 
1.  <eof> parser at end of file
2.  Per-module optimization passes
3.  Running pass 'Function Pass Manager' on module 'main.cpp'.
4.  Running pass 'Loop Spawning' on function '@_ZN6miniFE25generate_matrix_structureINS_9CSRMatrixIdiiEEEEiRKNS_23simple_mesh_descriptionINT_17GlobalOrdinalTypeEEERS4_'
clang-5.0: error: unable to execute command: Aborted (core dumped)
clang-5.0: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 5.0.0 (git@github.com:wsmoses/cilk-clang f81bbab46561384a0709c399c4cb0df9e4a3080b) (git@github.com:wsmoses/Parallel-IR.git f9d48f08baf80738b1501aa492df9b8dbd1521e6)
Target: x86_64-unknown-linux-gnu
Thread model: posix
wsmoses commented 6 years ago

Apologies, was away on vacation. Quick question: is this using your clang frontend? Likewise would it be possible to post the IR at the start of loopspawning.

My guess is that there is a memory-related thing on the loop bounds.

stelleg commented 6 years ago

Using your cilk-clang master. Only happens with optimizations turned on -O2. Haven't had a chance to dig deeper, sorry. Here's the relevant .ll file with -O0: https://gist.github.com/stelleg/89f40a1d726600956d72ed580194fcbc

wsmoses commented 6 years ago

Are you sure this is the latest version? [The hashes in your error don't seem to be in the X most recent commits and I tried compiling the ll file with the latest version and it seems fine up to a linker error.]

stelleg commented 6 years ago

Yep, getting the error running parallel-ir d89ba180c46fbe2fda5e2a5f595820bf4b75880d and cilk-clang 076e3106215e6a17a659ec1d015fdacf86f57ff2'

Strange that it's not failing when compiling the .ll file for you. Did you compile with -O2?

In the above error message I believe I was using my local version, thus the different hash, but I'm seeing the error with your latest heads as well.

neboat commented 6 years ago

I managed to recreate this error. I'll try to dive into the issue and see what's going wrong.

stelleg commented 6 years ago

@neboat I found the problematic cilk_for loop. It's at SparseMatrix_functions.hpp:75. Replacing the cilk_for with a for fixes the compilation problem. I've tried replacing the termination condition with something slightly simpler, but it seems to be the body of the loop that's the issue.

stelleg commented 6 years ago

The problematic LLVM IR:

pfor.cond:                                        ; preds = %pfor.inc, %entry
  %10 = load i32, i32* %__begin, align 4
  %11 = load i32, i32* %__end, align 4
  %cmp = icmp slt i32 %10, %11
  br i1 %cmp, label %pfor.detach, label %pfor.end

pfor.detach:                                      ; preds = %pfor.cond
  %12 = load i32, i32* %__init, align 4
  %13 = load i32, i32* %__begin, align 4
  %mul = mul nsw i32 %13, 1
  %add2 = add nsw i32 %12, %mul
  detach within %syncreg, label %pfor.body.entry, label %pfor.inc

pfor.body.entry:                                  ; preds = %pfor.detach
  %i = alloca i32, align 4
  %exn.slot = alloca i8*
  %ehselector.slot = alloca i32
  store i32 %add2, i32* %i, align 4
  br label %pfor.body

pfor.body:                                        ; preds = %pfor.body.entry
  %14 = load i32, i32* %i, align 4
  invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* %mat_init, i32 %14)
          to label %invoke.cont unwind label %lpad

invoke.cont:                                      ; preds = %pfor.body
  br label %pfor.preattach

pfor.preattach:                                   ; preds = %invoke.cont
  reattach within %syncreg, label %pfor.inc

pfor.inc:                                         ; preds = %pfor.preattach, %pfor.detach
  %15 = load i32, i32* %__begin, align 4
  %inc = add nsw i32 %15, 1
  store i32 %inc, i32* %__begin, align 4
  br label %pfor.cond, !llvm.loop !4

lpad:                                             ; preds = %pfor.body
  %16 = landingpad { i8*, i32 }
          cleanup
  %17 = extractvalue { i8*, i32 } %16, 0
  store i8* %17, i8** %exn.slot, align 8
  %18 = extractvalue { i8*, i32 } %16, 1
  store i32 %18, i32* %ehselector.slot, align 4
  br label %det.rethrow

det.rethrow:                                      ; preds = %lpad
  br label %eh.resume

eh.resume:                                        ; preds = %det.rethrow
  %exn = load i8*, i8** %exn.slot, align 8
  %sel = load i32, i32* %ehselector.slot, align 4
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
  %lpad.val3 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
  resume { i8*, i32 } %lpad.val3

pfor.end:                                         ; preds = %pfor.cond
  sync within %syncreg, label %pfor.end.continue

pfor.end.continue:                                ; preds = %pfor.end
  ret void
neboat commented 6 years ago

Thanks for the info. Is this the IR you get from -O2 compilation?

stelleg commented 6 years ago

Nope, couldn't get the Tapir IR from -O2 due to the error reported above. I believe the above is -O0. Here's -O1:

pfor.detach.preheader:                            ; preds = %entry
  br label %pfor.detach

pfor.cond.cleanup:                                ; preds = %pfor.inc, %entry
  sync within %syncreg, label %pfor.end.continue

pfor.end.continue:                                ; preds = %pfor.cond.cleanup
  call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
  ret void

pfor.detach:                                      ; preds = %pfor.detach.preheader, %pfor.inc
  %__begin.017 = phi i32 [ %inc, %pfor.inc ], [ 0, %pfor.detach.preheader ]
  detach within %syncreg, label %pfor.body, label %pfor.inc

pfor.body:                                        ; preds = %pfor.detach
  invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* nonnull %mat_init, i32 %__begin.017)
          to label %pfor.preattach unwind label %lpad

pfor.preattach:                                   ; preds = %pfor.body
  reattach within %syncreg, label %pfor.inc

pfor.inc:                                         ; preds = %pfor.preattach, %pfor.detach
  %inc = add nuw nsw i32 %__begin.017, 1
  %cmp = icmp slt i32 %inc, %1
  br i1 %cmp, label %pfor.detach, label %pfor.cond.cleanup, !llvm.loop !145

lpad:                                             ; preds = %pfor.body
  %2 = landingpad { i8*, i32 }
          cleanup
  call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
  resume { i8*, i32 } undef
stelleg commented 6 years ago

You think it has to do with the interaction between exception handling, i.e. invoke/resume, and Tapir?

neboat commented 6 years ago

Ah, OK. (Sorry, I realized in hindsight that my question was silly.) Thanks again for the info. Now I know where to look.

I suspect there's some issue with Tapir and exceptions, but I don't think it's the resume. (Still, it would be nice to have an excuse to implement my master plan for integrating exceptions with Tapir...)

stelleg commented 6 years ago

I'm glad to hear there's a master plan :). Seems like getting tapir and exceptions to play nicely could be non-trivial.

neboat commented 6 years ago

OK, I think I've identified the problem, and it is with exception handling. A quick work around (that preserves the cilk_for you identified before) is to add __attribute__((noinline)) to init_matrix in SparseMatrix_functions.hpp. I'm thinking through a better fix, but hopefully this change will still let you enjoy some Cilk parallelism in your code.

stelleg commented 6 years ago

Thanks for the workaround, did the job for me.

neboat commented 6 years ago

Wanted to give you a quick update. I'm currently testing a fix to Tapir's integration with exception-handling code. On my machine, I can now successfully build this test case without any work around. I would like to try running this test case and to try running the race detector on it. Can you please advise me on how to run this program?

stelleg commented 6 years ago

Nice! Assuming you've successfully built it, it should be miniFE.x in the src subdirectory. You can run it with no arguments to get a trivially small test case, or if you want something longer running, you can increase the size of the problem, e.g. ./miniFE.x --nx 100 --ny 100 --nz 100. Let me know if you have any issues.