llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.06k stars 11.98k forks source link

SCoP detection fails on Julia version of PolyBench's `trmm` benchmark #28500

Open llvmbot opened 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 28126
Version unspecified
OS All
Attachments Julia's LLVM IR for trmm, Reduced test case, Passing test case
Reporter LLVM Bugzilla Contributor
CC @Meinersbur,@tobiasgrosser

Extended Description

I ported PolyBench's trmm benchmark to Julia to test Polly on the LLVM IR that Julia produces when compiling it. The according Julia function that I used for this purpose looks as follows:

@​polly function kernel_trmm(alpha, A, B) m,n = size(B) for i = 1:m, j = 1:n for k = (i+1):m B[i,j] += A[k,i] B[k,j] end B[i,j] = alpha B[i,j] end end

The LLVM code that Julia emits for this function (when disabling bounds-checks via --check-bounds=no) is attached as kernel_trmm.ll.

I invoked Polly the following way on this LLVM code:

$ opt -debug-only=polly-detect -polly-codegen -S llvm/kernel_trmm.ll > /dev/null

This produces the following output:

Checking region: top => Top level region is invalid Checking region: top.split => L10 Non affine loop bound '({-2,+,-1}<%if> + %37)' in loop: if12 Checking region: if => L.L10_crit_edge Non affine loop bound '({-2,+,-1}<%if> + %37)' in loop: if12 Checking region: if => L.loopexit Non affine loop bound '({-2,+,-1}<%if> + %37)' in loop: if12 Checking region: if11 => L4.L.loopexit_crit_edge Non affine loop bound '({-2,+,-1}<%if> + %37)' in loop: if12 Checking region: if11 => L7 Non affine loop bound '({-2,+,-1}<%if> + %37)' in loop: if12 Checking region: if12 => L5.L7_crit_edge OK Expanding if12 => L5.L7_crit_edge Trying if12 => L7 to if12 => L7 Region can not profitably be optimized!

For computing loop bounds Julia emits select instructions and it seems that they are the source of the problem. This issue has also been subject to discussion on the polly-dev mailing list: https://groups.google.com/forum/#!topic/polly-dev/P2RZUlKRo0I The original C version of trmm however, when compiled via clang, can be optimized by Polly without problems.

I am currently working on a solution and hope to come up with new findings as soon as possible.

llvmbot commented 8 years ago

Thank you Michael, this was really helpful. I changed the SCEV analysis accordingly and it indeed solved this problem for trmm and the provided test case. I am currently also experimenting with other Julia programs and there are other situations as well where the lowering of for loops to LLVM produces code that cannot be handled by SCEV. I will wait with a patch until I know if further changes to SCEV are required.

Best, Matthias

Meinersbur commented 8 years ago

From kernel_trmm.ll :

%30 = add i64 %"#temp#1.030", 1 %36 = icmp sgt i64 %30, %12 %37 = select i1 %36, i64 %"#temp#1.030", i64 %12

ie: %37 = ((%"#temp#1.030" + 1) > %12) ? "#temp#1.030" : %12

This is not detected as an smax expression in ScalarEvolution::createNodeForSelectOrPHI because what is compared in icmp is not the same as in the select (%30 vs. %"#temp#1.030").

This equivalent code is detected as expected:

%36 = icmp sge i64 %"#temp#1.030", %12 %37 = select i1 %36, i64 %"#temp#1.030", i64 %12

ie: %37 = (%"#temp#1.030" >= %12) ? "#temp#1.030" : %12

I see two possible solutions:

1.) Make the Julia compiler generate the second variant.

2.) Teach ScalarEvolution::createNodeForSelectOrPHI to detect this pattern. Something like this:

case ICmpInst::ICMP_SGT: if (LDiff == RDiff || getAddRec(LDiff, 1) == RDiff) return getAddExpr(getSMaxExpr(LS, RS), LDiff);

(Take care for the case when %"#temp#1.030" has its maximum value)

This would be a change to LLVM itself and therefore should be reviewed by LLVM people as well.

llvmbot commented 8 years ago

I've attached another test case now that shows that Polly is able to handle select instructions in simpler situations, namely, when select is used to model an smax or umax expression. However, Polly relies on the SCEV Analysis pass to detect this particular usage of select. The first test case, that I attached earlier, fails only because the SCEV Analysis fails to detect that select is used to compute the maximum of two values.

That means, at the moment, Polly does not handle selects explicitly. The question that now arises is whether it is possible to model such instructions effectively. I assume in general they could be modeled like regular IR branches, but maybe that might not be very efficient? Another possibility would be to generalize the SCEV Analysis to detect more smax/umax patterns, however I am not very familiar with the internals of SCEV, so I cannot really estimate if that would be reasonable. And another possibility would be to change Julia's code generator to directly use smax instructions here instead of select which would have to be discussed on the Julia mailing list.

Is there a preferred way how this could be handled? I'm looking forward to hear your opinion on this.

Best, Matthias