Closed tobiasgrosser closed 7 years ago
Resolved in r303006.
Sure sir. Will do so.
I mean in PUBLIC
I would prefer if you could submit the patch to phabricator and then discuss on phabricator. It is best practice to have all development discussions in practice such that others can benefit from it.
Haha, right sir, I will communicate with Johannes to figure out why this fix works, in some depth. And then I will go ahead and submit the patch to Phabricator.
Thanks, Annanay
That's also a great moment to include Johannes, as you can reference his old commit in your patch summary and discuss its relation to this issue and how your solution addresses the problem you have seen
Sure. Now, we need to get this bug fixed. ;)
The standard process would be to submit a patch to reviews.llvm.org, motivate the bug fix in the commit message, and possibly add some additional evidence, e.g., by running the LLVM test-suite and showing that it does not trigger any bugs.
Hi sir,
I assumed that removing the BBMap insert statement in BlockGenerators.cpp::trySynthesizeNewValue() was the fix to this bug. But I will speak to Johannes about the Insertion point detection algorithm before coming to a consensus on this.
Thanks, Annanay
Hi Annanay,
any update here?
Hi Annanay,
thanks for the thorough investigation and also the fruitful discussion. If you would like to get more information about the code snippet you asked above, it might be useful to ask Johannes, who committed this change, but unfortunately did not add comments:
commit a4c1f49147b844584ada9b25bb7bd85d65819cb6 Author: Johannes Doerfert doerfert@cs.uni-saarland.de Date: Mon Jun 6 13:05:21 2016 +0000
[FIX] Determine insertion point during SCEV expansion
git-svn-id: https://llvm.org/svn/llvm-project/polly/trunk@271892 9117
7308-0d34-0410-b5e6-96231b3b80d8
Best, Tobias
I think I may have found something interesting regarding the same. So I checked the modification of the insert point in llvm::SCEVExpander::expand(), and found that it doesn't change the insert point erroneously. However -
File - polly / lib / Support / ScopHelper.cpp Class (Actually struct) - ScopExpander
I traced the code generation to the expandCodeFor() in this struct. The arguments that are passed - SCEV E which is the llvm.expect statement, Instruction I is actually the InsertPoint for this statement. Going through the control flow -
Polly's expandCodeFor is called using insert point: br label %polly.exiting
In function visitUnknown, Calculating IP for %tmp5 = tail call i64 @llvm.expect.i64(i64 %tmp4, i64 0) Calculated IP: br i1 true, label %polly.start, label %bb1.pre_entry_bb
// This is fine, it is okay if Polly's code generator wants to hoist the // instruction. It should hoist all its dependencies as well, which it // does next.
In function visitGenericInst,
Value of operand that must be code generated: %tmp4 = zext i1 %tmp3 to i64 Value of InsertPoint: br i1 true, label %polly.start, label %bb1.pre_entry_bb
Polly's expandCodeFor is called using insert point:
br i1 true, label %polly.start, label %bb1.pre_entry_bb
The basic block of insert point - polly.split_new_and_old: ; preds = %bb br i1 true, label %polly.start, label %bb1.pre_entry_bb
In function visitUnknown, Calculating IP for
%p_tmp3 = icmp ult i64 4, 12 Calculated IP : %p_tmp3 = icmp ult i64 4, 12
// For this instruction, IP is calculated to be the instruction // itself. Also, when it calls visitGenericInst here, it satisfies the // first if condition // if(!Inst || !R.contains(Inst)) // return E;
The rest of the code generation happens smoothly except for this statement.
Regions confuse me, I printed out the Region everytime in visitUnknown, its value is.
[1] bb1 => polly.merge_new_and_old
If someone could explain the logic behind this code in the function visitUnknown -
if (Inst && !R.contains(Inst)){
IP = Inst;
}
else if (Inst && RTCBB->getParent() == Inst->getFunction()){
IP = RTCBB->getTerminator();
}
else{
IP = RTCBB->getParent()->getEntryBlock().getTerminator();
}
I think the problem might be here.
Thanks,
This was a busy week with end semester exams, sorry for the delayed response.
Second question: The error happens because, when the llvm.expect function is code generated through trySynthesizeValue() -> expandCodeFor() -> llvm::SCEVExpander::expand() it inserts the statement in a BB that is hoisted as far out as possible. The InsertPoint that is passed is not honored for some reason. Normal codegeneration inserts it in the right place.
What also is interesting. If llvm.expect can be hoised, why can the operands it depends on not also be hoisted?
Thank you sir :)
Previously, when Polly was supposed to code-generate the llvm.expect instruction, it merely skipped it because it was part of the ignoredIntrinsic list. Then, when it encountered the trunc instruction, it was forced to code-generate llvm.expect using trySynthesizeValue(). So, for your first question, I do not think it harms to code generate llvm.expect in the first place.
Second question: The error happens because, when the llvm.expect function is code generated through trySynthesizeValue() -> expandCodeFor() -> llvm::SCEVExpander::expand() it inserts the statement in a BB that is hoisted as far out as possible. The InsertPoint that is passed is not honored for some reason. Normal codegeneration inserts it in the right place.
I am exploring the last question about the BB split, I am trying to come up with such a test case. Will post any updates here.
Thanks, Annanay
Hi Annanay,
I am impressed. Without asking a lot of further questions, you are clearly capable of digging your way through the code, understanding the issues, and proposing sensible solutions! Very nice, indeed!
Now to the technical discussion: I am surprised, that having llvm.expect in the ignored intrinsics fixes this issue. If these indeed fixes the problem, we should take it. However, beforehand we should understand why this works. I would assume llvm.expect does not have any side effects and would consequently be ignored anyhow. Hence, it likely does not need to be in the ignoredIntrinsic's list for the scop to be detected. If all tests pass and your test case still generates Polly code during code generation, this is true. Can you confirm this?
Having an intrinsic in isIgnoredIntrinics means for code generation, that we decide to not generate code for llvm.expect(). Hence, by removing llvm.expect from isIgnoredIntrinsic, we ask the code generator to not ignore it when generating the basic block. This seems to be OK, as the llvm.expect() should still hold after Polly rescheduled a piece of code. I am not 100% clear why this fixes the code generation issue though. The direct impact is that the return value of llvm.expect will appear in the BBList, which is then likely used instead of inserting a new copy higher up in the program. Can you investigate what exactly happens here.
The last question I am interested in. Assuming the bb in our test case is split in two bbs, such that the llvm.expect is in the first bb and its use is in the second. Does this also crash today? Is this test case also fixed by removing llvm.expect from the set of ignored intrinsics?
Best, Tobias
Best, Tobias
trySynthesizeNewValue() is called with the @llvm.expect instruction as parameter Old. It determines the correct insert point using Builder.GetInsertPoint() - which is the br statement at the end of polly.stmt.bb2 - this is the desired value. After this, expandCodeFor is called to get the expanded statement. In this function, although the correct insert point is passed, the insertion is hoisted to a higher loop nest. This is performed because the instruction is loop invariant. This, infact, is part of llvm's SCEV expander. Code can be found in the file - llvm / lib / Analysis / ScalarEvolutionExpander.cpp - function SCEVExpander::expand().
The weird part was, when I removed llvm.expect from the list of ignored intrinsics, in the file - polly / lib / Support / ScopHelper.cpp - function isIgnoredInstrinsic() , it passed all regression tests, and eliminated this error as well. It copied the basic block in the right place as shown (I debug printed copyBB) -
############## copyBB is complete ################
polly.stmt.bb2: ; preds = %polly.stmt.bb1 %p_tmp3 = icmp ult i64 4, 12 %p_tmp4 = zext i1 %p_tmp3 to i64 %p_tmp5 = tail call i64 @llvm.expect.i64(i64 %p_tmp4, i64 0) %p_tmp6 = trunc i64 %p_tmp5 to i32 br label %polly.exiting
############## copyBB is complete ################
I am not sure if changing the SCEVExpander is the right thing to do. Please let me know your thoughts.
Thanks :)
@tobias sir. In the second test case, AFAIK, what is happening is that since %tmp5 is not being used anywhere, Polly discards CodeGeneration for the statement at line 20 (The statement containing the llvm.expect call). This can be confirmed with the fact that the function "trySynthesizeNewValue()" in lib / CodeGen / BlockGenerators.cpp is not invoked in the execution for this program, whereas it is called in the original test case and also in the one that I submitted earlier.
I am trying to understand the trySynthesizeNewValue() function. Also, where and why Polly performs copyBB and splitBB.
Even simpler test case I attached an even simpler test case.The interesting part is here:
%tmp3 = icmp ult i64 4, 12
%tmp4 = zext i1 %tmp3 to i64
%tmp5 = tail call i64 @llvm.expect.i64(i64 %tmp4, i64 0)
%tmp6 = trunc i64 %tmp5 to i32 ; Variant A
; %tmp6 = trunc i64 %tmp4 to i32 ; Variant B
store i32 %tmp6, i32* %A
Variant A breaks, variant B does not. The issue seems to be during expression generation. In Variant A, @llvm.expect and the zext expression are generated at the top of the region, instead of the basic block where they have been before. That looks wrong and likely needs to be corrected. Variant B get's it right.
Test case demonstrating problem with icmp statement The icmp statement from the test file "expect-causes-issues.ll" given in this bug, in line 18, has been replaced with a dummy "add" statement. opt -polly-codegen icmp-causes-issues.ll -polly-process-unprofitable runs without any errors.
The problem seems not to lie with code generation for llvm.expect but with the "icmp" statement that the llvm.expect has a dependency on.
This is actually confirmed by a review of the code as, SCEV cannot model the "icmp" statement. In file lib/Support/SCEVAffinator.cpp, we see SCEV modelling for all IR statements including zext, add, div, etc.
There is a design decision to be taken here. The choice is:
Extended Description
opt -polly-codegen /tmp/expect-causes-issues.ll -polly-process-unprofitable WARNING: You're attempting to print out a bitcode file. This is inadvisable as it may cause display problems. If you REALLY want to taste LLVM bitcode first-hand, you can force output with the `-f' option.
Instruction does not dominate all uses! %p_tmp3 = icmp ult i64 4, 12 %0 = zext i1 %p_tmp3 to i64