Open llvmbot opened 7 years ago
Has anything changed here?
This optimization was on for quite a while but recently was disabled again because it exposes a problem in indvarsimplify. See for details: llvm/llvm-bugzilla-archive#31181
Artur: Is this fixed now? Can this bug be closed?
Right, here is the initial optimization patch: https://reviews.llvm.org/D23059 (went in as r278107)
Later we discovered that this patch interacts with indvar simplify in a strange manner and cause regression on some of our internal benchmarks. So, I disabled the optimization.
Here is the change which addresses the problem in indvar simplify: https://reviews.llvm.org/D24280
Once it's in we can turn the subject change on.
Hi Artur,
do I understand it correctly, that the change actually just disables an not-quite-ready optimization that you have been working on? If that's the case, then we should probably close this bug and just consider it as a sign that this optimization indeed matters.
Bisect pointed out to https://reviews.llvm.org/rL279082
r279082 | apilipenko | 2016-08-18 09:08:35 -0700 (Thu, 18 Aug 2016) | 3 lines
CVP. Turn marking adds as no wrap (introduced by r278107) off by default
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.
Extended Description
Recently, Chrome observed a significant (~15%) regression after rolling the new version of Clang (r280106, trunk at the time), see https://crbug.com/643724.
A minimal repro has been extracted (attached). To reproduce the issue:
clang++ -o loop loop.cc -fuse-ld=gold -O2 -flto
Note that using LTO seems to be the trigger for the bug (but not necessarily the reason).
Before regression, r278861:
00000000004006e0 <_Z31absoluteColumnToEffectiveColumnj>: 4006e0: 48 8b 15 61 19 00 00 mov 0x1961(%rip),%rdx # 402048 <m_effectiveColumns+0x8> 4006e7: 4c 8b 05 52 19 00 00 mov 0x1952(%rip),%r8 # 402040
4006ee: 4c 29 c2 sub %r8,%rdx
4006f1: 48 c1 ea 02 shr $0x2,%rdx
4006f5: 31 c0 xor %eax,%eax
4006f7: 85 d2 test %edx,%edx
4006f9: 74 2e je 400729 <_Z31absoluteColumnToEffectiveColumnj+0x49>
4006fb: 89 d2 mov %edx,%edx
4006fd: 31 c0 xor %eax,%eax
4006ff: 31 f6 xor %esi,%esi
400701: 66 66 66 66 66 66 2e data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
400708: 0f 1f 84 00 00 00 00
40070f: 00
400710: 41 8b 3c 80 mov (%r8,%rax,4),%edi # <=================== Everything is good here
400714: 8d 4c 37 ff lea -0x1(%rdi,%rsi,1),%ecx
400718: 83 f9 0a cmp $0xa,%ecx
40071b: 73 0c jae 400729 <_Z31absoluteColumnToEffectiveColumnj+0x49>
40071d: 01 f7 add %esi,%edi
40071f: 48 ff c0 inc %rax
400722: 48 39 d0 cmp %rdx,%rax
400725: 89 fe mov %edi,%esi
400727: 72 e7 jb 400710 <_Z31absoluteColumnToEffectiveColumnj+0x30>
400729: c3 retq
40072a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
After regression, r280106:
00000000004006e0 <_Z31absoluteColumnToEffectiveColumnj>: 4006e0: 48 8b 0d 61 19 00 00 mov 0x1961(%rip),%rcx # 402048 <m_effectiveColumns+0x8> 4006e7: 4c 8b 05 52 19 00 00 mov 0x1952(%rip),%r8 # 402040
4006ee: 4c 29 c1 sub %r8,%rcx
4006f1: 48 c1 e9 02 shr $0x2,%rcx
4006f5: 31 c0 xor %eax,%eax
4006f7: 85 c9 test %ecx,%ecx
4006f9: 74 1e je 400719 <_Z31absoluteColumnToEffectiveColumnj+0x39>
4006fb: 31 f6 xor %esi,%esi
4006fd: 31 c0 xor %eax,%eax
4006ff: 90 nop
400700: 89 c7 mov %eax,%edi # <============================== unnecessary mov
400702: 41 8b 3c b8 mov (%r8,%rdi,4),%edi
400706: 8d 54 37 ff lea -0x1(%rdi,%rsi,1),%edx
40070a: 83 fa 0a cmp $0xa,%edx
40070d: 73 0a jae 400719 <_Z31absoluteColumnToEffectiveColumnj+0x39>
40070f: 01 f7 add %esi,%edi
400711: ff c0 inc %eax
400713: 39 c8 cmp %ecx,%eax
400715: 89 fe mov %edi,%esi
400717: 72 e7 jb 400700 <_Z31absoluteColumnToEffectiveColumnj+0x20>
400719: c3 retq
40071a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
As you can see, there's an unnecessary mov in the newer code.
I have also verified that the issue persists in the current trunk (r280929):
00000000004006c0 <_Z31absoluteColumnToEffectiveColumnj>: 4006c0: 48 8b 0d 81 19 00 00 mov 0x1981(%rip),%rcx # 402048 <m_effectiveColumns+0x8> 4006c7: 4c 8b 05 72 19 00 00 mov 0x1972(%rip),%r8 # 402040
4006ce: 4c 29 c1 sub %r8,%rcx
4006d1: 48 c1 e9 02 shr $0x2,%rcx
4006d5: 31 c0 xor %eax,%eax
4006d7: 85 c9 test %ecx,%ecx
4006d9: 74 1e je 4006f9 <_Z31absoluteColumnToEffectiveColumnj+0x39>
4006db: 31 f6 xor %esi,%esi
4006dd: 31 c0 xor %eax,%eax
4006df: 90 nop
4006e0: 89 c7 mov %eax,%edi # <================================ unnecessary mov
4006e2: 41 8b 3c b8 mov (%r8,%rdi,4),%edi
4006e6: 8d 54 37 ff lea -0x1(%rdi,%rsi,1),%edx
4006ea: 83 fa 0a cmp $0xa,%edx
4006ed: 73 0a jae 4006f9 <_Z31absoluteColumnToEffectiveColumnj+0x39>
4006ef: 01 f7 add %esi,%edi
4006f1: ff c0 inc %eax
4006f3: 39 c8 cmp %ecx,%eax
4006f5: 89 fe mov %edi,%esi
4006f7: 72 e7 jb 4006e0 <_Z31absoluteColumnToEffectiveColumnj+0x20>
4006f9: c3 retq
4006fa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)