Open stelleg opened 6 years ago
Apologies, was away on vacation. Quick question: is this using your clang frontend? Likewise would it be possible to post the IR at the start of loopspawning.
My guess is that there is a memory-related thing on the loop bounds.
Using your cilk-clang master. Only happens with optimizations turned on -O2
. Haven't had a chance to dig deeper, sorry. Here's the relevant .ll file with -O0
: https://gist.github.com/stelleg/89f40a1d726600956d72ed580194fcbc
Are you sure this is the latest version? [The hashes in your error don't seem to be in the X most recent commits and I tried compiling the ll file with the latest version and it seems fine up to a linker error.]
Yep, getting the error running
parallel-ir d89ba180c46fbe2fda5e2a5f595820bf4b75880d
and
cilk-clang 076e3106215e6a17a659ec1d015fdacf86f57ff2
'
Strange that it's not failing when compiling the .ll
file for you. Did you compile with -O2
?
In the above error message I believe I was using my local version, thus the different hash, but I'm seeing the error with your latest heads as well.
I managed to recreate this error. I'll try to dive into the issue and see what's going wrong.
@neboat I found the problematic cilk_for
loop. It's at SparseMatrix_functions.hpp:75
. Replacing the cilk_for
with a for
fixes the compilation problem. I've tried replacing the termination condition with something slightly simpler, but it seems to be the body of the loop that's the issue.
The problematic LLVM IR:
pfor.cond: ; preds = %pfor.inc, %entry
%10 = load i32, i32* %__begin, align 4
%11 = load i32, i32* %__end, align 4
%cmp = icmp slt i32 %10, %11
br i1 %cmp, label %pfor.detach, label %pfor.end
pfor.detach: ; preds = %pfor.cond
%12 = load i32, i32* %__init, align 4
%13 = load i32, i32* %__begin, align 4
%mul = mul nsw i32 %13, 1
%add2 = add nsw i32 %12, %mul
detach within %syncreg, label %pfor.body.entry, label %pfor.inc
pfor.body.entry: ; preds = %pfor.detach
%i = alloca i32, align 4
%exn.slot = alloca i8*
%ehselector.slot = alloca i32
store i32 %add2, i32* %i, align 4
br label %pfor.body
pfor.body: ; preds = %pfor.body.entry
%14 = load i32, i32* %i, align 4
invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* %mat_init, i32 %14)
to label %invoke.cont unwind label %lpad
invoke.cont: ; preds = %pfor.body
br label %pfor.preattach
pfor.preattach: ; preds = %invoke.cont
reattach within %syncreg, label %pfor.inc
pfor.inc: ; preds = %pfor.preattach, %pfor.detach
%15 = load i32, i32* %__begin, align 4
%inc = add nsw i32 %15, 1
store i32 %inc, i32* %__begin, align 4
br label %pfor.cond, !llvm.loop !4
lpad: ; preds = %pfor.body
%16 = landingpad { i8*, i32 }
cleanup
%17 = extractvalue { i8*, i32 } %16, 0
store i8* %17, i8** %exn.slot, align 8
%18 = extractvalue { i8*, i32 } %16, 1
store i32 %18, i32* %ehselector.slot, align 4
br label %det.rethrow
det.rethrow: ; preds = %lpad
br label %eh.resume
eh.resume: ; preds = %det.rethrow
%exn = load i8*, i8** %exn.slot, align 8
%sel = load i32, i32* %ehselector.slot, align 4
%lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
%lpad.val3 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
resume { i8*, i32 } %lpad.val3
pfor.end: ; preds = %pfor.cond
sync within %syncreg, label %pfor.end.continue
pfor.end.continue: ; preds = %pfor.end
ret void
Thanks for the info. Is this the IR you get from -O2 compilation?
Nope, couldn't get the Tapir IR from -O2 due to the error reported above. I believe the above is -O0. Here's -O1:
pfor.detach.preheader: ; preds = %entry
br label %pfor.detach
pfor.cond.cleanup: ; preds = %pfor.inc, %entry
sync within %syncreg, label %pfor.end.continue
pfor.end.continue: ; preds = %pfor.cond.cleanup
call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
ret void
pfor.detach: ; preds = %pfor.detach.preheader, %pfor.inc
%__begin.017 = phi i32 [ %inc, %pfor.inc ], [ 0, %pfor.detach.preheader ]
detach within %syncreg, label %pfor.body, label %pfor.inc
pfor.body: ; preds = %pfor.detach
invoke void @_ZN12MatrixInitOpIN6miniFE9CSRMatrixIdiiEEEclEi(%struct.MatrixInitOp* nonnull %mat_init, i32 %__begin.017)
to label %pfor.preattach unwind label %lpad
pfor.preattach: ; preds = %pfor.body
reattach within %syncreg, label %pfor.inc
pfor.inc: ; preds = %pfor.preattach, %pfor.detach
%inc = add nuw nsw i32 %__begin.017, 1
%cmp = icmp slt i32 %inc, %1
br i1 %cmp, label %pfor.detach, label %pfor.cond.cleanup, !llvm.loop !145
lpad: ; preds = %pfor.body
%2 = landingpad { i8*, i32 }
cleanup
call void @llvm.lifetime.end.p0i8(i64 88, i8* nonnull %0) #2
resume { i8*, i32 } undef
You think it has to do with the interaction between exception handling, i.e. invoke/resume
, and Tapir?
Ah, OK. (Sorry, I realized in hindsight that my question was silly.) Thanks again for the info. Now I know where to look.
I suspect there's some issue with Tapir and exceptions, but I don't think it's the resume. (Still, it would be nice to have an excuse to implement my master plan for integrating exceptions with Tapir...)
I'm glad to hear there's a master plan :). Seems like getting tapir and exceptions to play nicely could be non-trivial.
OK, I think I've identified the problem, and it is with exception handling. A quick work around (that preserves the cilk_for
you identified before) is to add __attribute__((noinline))
to init_matrix
in SparseMatrix_functions.hpp
. I'm thinking through a better fix, but hopefully this change will still let you enjoy some Cilk parallelism in your code.
Thanks for the workaround, did the job for me.
Wanted to give you a quick update. I'm currently testing a fix to Tapir's integration with exception-handling code. On my machine, I can now successfully build this test case without any work around. I would like to try running this test case and to try running the race detector on it. Can you please advise me on how to run this program?
Nice! Assuming you've successfully built it, it should be miniFE.x
in the src
subdirectory. You can run it with no arguments to get a trivially small test case, or if you want something longer running, you can increase the size of the problem, e.g. ./miniFE.x --nx 100 --ny 100 --nz 100
. Let me know if you have any issues.
When building the cilk version of a simple finite elements application: https://mantevo.org/downloads/miniFE_2.0.1.html
Have done a small amount of digging, but thought I'd submit an issue in case the fix is obvious to someone else. I'll keep digging in the meantime.
Error output: