Miscompile lencod (after Sanjoy's SCEV changes?)

tobiasgrosser commented 8 years ago


Bugzilla Link	27944
Resolution	FIXED
Resolved on	Jul 11, 2016 07:21
Version	unspecified
OS	Linux
CC	@hfinkel,@jdoerfert,@jdoerfert,@sanjoy

Extended Description

It seems that after Sanjoys SCEV changes that happend between 115336 and 115351 a miscompile was introduced in FAIL: lencod.execution_time

http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3034

To the the build-breakage that happened before it is not easily possible to track down the precise version.

tobiasgrosser commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#27980

tobiasgrosser commented 8 years ago

I committed the fix proposed below in r275053. This addresses the most obvious issues and should make the buildbots green again.

We may still need to review the load hoisting code more carefully. The fact that the run-time check we construct does not take into account the invalid-assumptions we derive for memory accesses looks rather fragile and would clearly benefit from some further review and testing. Unfortunately, Johannes is currently not available to help with these issues.

tobiasgrosser commented 8 years ago

Reduced test case Hi guys,

I just managed to reduce this test case further. Interestingly, the new test case failed already before Sanjoy's patch.

I see the following with latest Polly (274430):

$opt test.ll -polly-scops -polly-process-unprofitable -polly-invariant-load-hoisting=false -analyze

Invariant Accesses: {
}
Context:
{  :  }
Assumed Context:
{  :  }
Invalid Context:
{  : 1 = 0 }
Alias Groups (0):
    n/a
Statements {
    Stmt_loop

... Stmt_loop_next Domain := { Stmt_loop_next[0] }; Schedule := { Stmt_loop_next[i0] -> [0, 1] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_loop_next[i0] -> MemRef_val__phi[] }; ReadAccess := [Reduction Type: NONE] [Scalar: 0] { Stmt_loop_next[i0] -> MemRef_a[1] }; }

$opt test.ll -polly-scops -polly-process-unprofitable -analyze Invariant Accesses: { ReadAccess := [Reduction Type: NONE] [Scalar: 0] { Stmt_loop_next[i0] -> MemRef_a[1] }; Execution Context: { : 1 = 0 } } Context: { : } Assumed Context: { : } Invalid Context: { : 1 = 0 } Statements { Stmt_loop ... Stmt_loop_next Domain := { Stmt_loop_next[0] }; Schedule := { Stmt_loop_next[i0] -> [0, 1] }; MustWriteAccess := [Reduction Type: NONE] [Scalar: 1] { Stmt_loop_next[i0] -> MemRef_val__phi[] }; }

Even though Stmt_loop_next is always executed, it has an empty execution context when hoisted. This seems incorrect.

It seems the reason is that the subscript expression of the hoisted memory access is "{0,+,4}<%loop>" and does not have nsw/nuw annotations. As a result, the invalid context computed in MemoryAccess::getPwAff is non-empty '{ [i0] : i0 <= -2305843009213693953 or i0 >= 2305843009213693952 }' and will -- when projected on the parameters -- indicate that any execution of this memory access is invalid and consequently won't happen.

It seems we have a mismatch here. The invalid-context of the entire scop still allows the scop to be executed while we assume locally that some parts won't be executed.

The following patch fixes this bug as well as the LNT bug we see in MultiSource/Applications/JM/lencod:

--- a/lib/Analysis/ScopInfo.cpp +++ b/lib/Analysis/ScopInfo.cpp @@ -869,7 +869,9 @@ void MemoryAccess::dump() const { print(errs()); } __isl_give isl_pw_aff MemoryAccess::getPwAff(const SCEV E) { auto *Stmt = getStatement(); PWACtx PWAC = Stmt->getParent()->getPwAff(E, Stmt->getEntryBlock());

InvalidDomain = isl_set_union(InvalidDomain, PWAC.second);
isl_set *StmtDom = isl_set_reset_tuple_id(getStatement()->getDomain());
isl_set *NewInvalidDom = isl_set_intersect(StmtDom, PWAC.second);
InvalidDomain = isl_set_union(InvalidDomain, NewInvalidDom); return PWAC.first; }

It restricts the invalid accesses to the memory accesses that are actually executed, which in this very case is enough to fix this issue as the only value of i0 that is every executed is i0 = 0, and the problematic values of i0 are larger.

It would probably also make sense to collect the invalid context from the MemoryAccesses to ensure we won't get into a situation again where these invalid contexts are used, but not enforced by our run-time checks.

Johannes, any thoughts on this?

Best, Tobias

tobiasgrosser commented 8 years ago

@Johannes, this bug seems to be in code which you know best. Could you possibly have a look?

OK. This is really a nasty bug, but I can reproduce it. I first tried to create a big .bc file and then just wanted to run bugpoint to reduce it. Unfortunately, this did not work for two reasons: 1) opt and llc/lli took minutes to just run this test case once. So everything took ages. 2) When specifying the individual passes one-by-one instead of using -O3, this bug disappears. :( So I had to revert to manually bisecting this bug. After a couple of hours debugging this is what I got:

1) https://llvm.org/svn/llvm-project/llvm/trunk@271151 is indeed the change that triggers this bug.

2) There is only a single mis-compile

File: block.c Function: dct_chroma Region: %for.body660---%if.end1256

3) The bug disappears if I add -polly-invariant-load-hoisting=false

Before Sanjoys patch we get:

Invariant Accesses: {
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
        Execution Context: [qp_per_dc.0] -> {  :  }
}

After Sanjoys patch we get:

Invariant Accesses: {
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_if_else764[i0] -> MemRef_197[0] };
        Execution Context: [qp_per_dc.0] -> {  :  }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 0] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 2] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 1] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
        ReadAccess :=   [Reduction Type: NONE] [Scalar: 0]
            [qp_per_dc.0] -> { Stmt_for_inc826_for_body660_crit_edge[i0] -> MemRef_dct_chroma_m4[1, 3] };
        Execution Context: [qp_per_dc.0] -> {  : 1 = 0 }
}

Without Load Hoisting we get:

Invariant Accesses: {                                                        
}                                                                            
Context:                                                                     
[qp_per_dc.0] -> {  : -2147483648 <= qp_per_dc.0 <= 2147483647 }             
Assumed Context:                                                             
[qp_per_dc.0] -> {  :  }                                                     
Invalid Context:                                                             
[qp_per_dc.0] -> {  : 1 = 0 }                                                
p0: %qp_per_dc.0

How to reproduce the bug:

Go to /home/grosser/Projects/polly/test-suite/MultiSource/Applications/JM/lencod and run:

for i in ls *.c; do echo $i; /home/grosser/tmp/llvm-project/llvm-project/build/bin/clang -O3 -mllvm -polly -mllvm -polly-process-unprofitable -mllvm -polly-position=before-vectorizer $i -c -o $i.o; done && clang *.o -o executable && ./executable -d data/encoder_small.cfg -p InputFile=data/foreman_part_qcif_444.yuv -p LeakyBucketRateFile=data/leakybucketrate.cfg -p QmatrixFile=data/q_matrix.cfg > executable.out && diff lencod.reference_output.small executable.out

tobiasgrosser commented 8 years ago

Bug llvm/llvm-bugzilla-archive#27980 has been marked as a duplicate of this bug.

tobiasgrosser commented 8 years ago

Hi Sanjoy,

you are right. This is really just the one commit. I tried to revert it locally, but I unfortunately already get conflicts. The command to try this locally should be:

lnt runtest nt --sandbox /tmp --cc llvm_build/bin/clang-3.9 --test-suite test-suite/ --only-test MultiSource/Applications/JM/lencod --mllvm=-polly --mllvm=-polly-position=before-vectorizer

I am currently traveling so I won't be able to debug this until next week. As the SCEV nsw stuff has many corner cases, it might very well be that this just triggers a Polly bug. So don't feel obliged to debug this trough. However, if you have some insights I would be glad to hear them.

Best, Tobias

sanjoy commented 8 years ago

Hi Tobias,

It look like it broke earlier, in http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable/builds/3026/ and the only SCEV change in that build is 271151. I didn't notice this earlier because I thought this too was a failed build due to removing the apply method.

Do you know why the test is failing? Is it a performance regression, or is the benchmark infinite looping?

Finally, can you point me to some instructions on how I can reproduce this?

llvm / llvm-project

Miscompile lencod (after Sanjoy's SCEV changes?) #28318

Extended Description