Closed regehr closed 2 years ago
Interesting problem:
172 %ECX<def> = MOV32rr %reg1039<kill>
180 INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>, 36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0
188 %EAX<def> = MOV32rr %EAX<kill>
196 %ECX<def> = MOV32rr %ECX<kill>
204 %ECX<def> = MOV32rr %ECX<kill>
212 %EAX<def> = MOV32rr %EAX<kill>
220 %EAX<def> = MOV32rr %EAX
228 %reg1039<def> = MOV32rr %ECX<kill>
The early clobber operand ties ECX input to the ECX def.
The live interval of ECX is represented as this:
%reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47)
It does not specify there was a definition at 181.
SimpleRegisterCoalescing::AdjustCopiesBackFrom() thinks it's perfectly ok to eliminate the copy at 228 since it's defined by a copy from reg1039 at 172.
This is a coalescer bug.
I don't think this is a codegen issue. This is a bug in the processor. It should have understood what codegen is really trying to do. :-)
I'll deal with this.
testcase for the above Repro the above with llc t.bc -o -
Here's my last testcase:
#include <stdio.h>
int umoddi3 (unsigned long long u, unsigned long long v) __attribute__((noinline));
int umoddi3 (unsigned long long u, unsigned long long v) {
unsigned int d0, d1, n0, n1, n2;
d0 = v;
d1 = v >> 32;
n0 = u;
n1 = u >> 32;
printf("n0=%d\n", n0);
if (n0 >= d0)
{
__asm__ ("subl %5,%1\n\tsbbl %3,%0"
: "=r" (n1), "=&r" (n0)
: "0" (n1), "g" (d1), "1" (n0), "g" (d0));
}
return n0+n1;
}
volatile unsigned long long A = -1, B = -975165334;
int main (void) {
printf ("result = %d\n", umoddi3(A,B));
return 0;
}
At -O1, it is producing this output:
## InlineAsm Start
subl -12(%ebp),%ecx
sbbl -8(%ebp),%eax
## InlineAsm End
LBB1_2: ## bb2
addl %esi, %eax
addl $20, %esp
popl %esi
popl %ebp
ret
Note that the result of the asm is in ecx/eax, but that the epilog block adds esi/eax. The code coming out of isel looks fine:
INLINEASM <es:subl $5,$1
sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX, 36, <fi#0>, 1, %reg0, 0, 9, %ECX, 36, <fi#1>, 1, %reg0, 0
%reg1035<def> = MOV32rr %EAX
%reg1036<def> = MOV32rr %ECX
%reg1030<def> = MOV32rr %reg1036
%reg1029<def> = MOV32rr %reg1035
Successors according to CFG: 0x1818008 (#3)
bb2: 0x1818008, LLVM BB @​0x15028a0, ID#3:
Predecessors according to CFG: 0x1817f28 (#1) 0x1817f98 (#2)
%reg1031<def> = PHI %reg1028, mbb<entry.bb2_crit_edge,0x1817f28>, %reg1029, mbb<bb,0x1817f98>
%reg1032<def> = PHI %reg1027, mbb<entry.bb2_crit_edge,0x1817f28>, %reg1030, mbb<bb,0x1817f98>
%reg1037<def> = ADD32rr %reg1032, %reg1031, %EFLAGS<imp-def>
so this is a phi elim or coalescer issue. I'm going to let someone else take this from here.
Here's my currently reduced testcase (works at -O0 fails at -O1+):
unsigned long long umoddi3 (unsigned long long u, unsigned long long v) attribute((noinline));
unsigned long long umoddi3 (unsigned long long u, unsigned long long v) { unsigned int d0, d1, n0, n1, n2;
d0 = v; d1 = v >> 32; n0 = u; n1 = u >> 32; printf("d0=%d d1=%d n0=%d n1=%d\n", d0, d1, n0, n1);
if (n0 >= d0) { asm ("subl %5,%1\n\tsbbl %3,%0" : "=r" (n1), "=&r" (n0) : "0" (n1), "g" (d1), "1" (n0), "g" (d0)); }
return n0 | ((long long)n1 << 32); }
volatile unsigned long long A = -1, B = -975165334;
int main (void) { printf ("%d\n", (int)(umoddi3(A,B))); return 0; }
Did I say that it was a bug in mem2reg? I was merely pointing out that what I saw. Draw your own conclusions as to where the bug is.
btw, the mem2reg'd code works with llc -fast but fails with llc. I'm positive it is not mem2reg's fault.
for the zillionth time, this is almost certainly not a bug in mem2reg or sroa. This is a bug in llc that is exposed by them. I'm actively reducing a testcase.
I pulled hte code for _umoddi3 into the .c file to test (so the test itself passes with llvm-gcc -O0 and fails with -O3) and bugpoint reduced it. It blames scalarrepl, but I am assuming that this is a llc bug. Continuing investigation.
Then it's probably the SROA problem that I mentioned above. I forced SROA to stop after performing a call to promote "mem2reg" and it crashed.
I pulled hte code for _umoddi3 into the .c file to test (so the test itself passes with llvm-gcc -O0 and fails with -O3) and bugpoint reduced it. It blames scalarrepl, but I am assuming that this is a llc bug. Continuing investigation.
Reduced driver:
volatile unsigned long long A = -1, B = -975165334; int main (void) { printf ("%d\n", (int)(A%B)); return 0; }
I can repro this, it is definitely _umoddi3
It seems to be an asm problem. This testcase gives the right result if compiled with llc, but when optimized with -inline and then compiled with llc the result changes. To get this testcase I extracted each asm into it's own function then optimized what was left with -std-compile-opts. Using -inline puts the asm statements back where they were originally - and triggers the bug.
If you add "alwaysinline" to the @subl function and "noinline" to the other inline asm functions, then this is triggered. All other combinations pass.
I mean that if @subl isn't inlined, then the test passes no matter what function attribute is added to the other two functions.
It seems to be an asm problem. This testcase gives the right result if compiled with llc, but when optimized with -inline and then compiled with llc the result changes. To get this testcase I extracted each asm into it's own function then optimized what was left with -std-compile-opts. Using -inline puts the asm statements back where they were originally - and triggers the bug.
If you add "alwaysinline" to the @subl function and "noinline" to the other inline asm functions, then this is triggered. All other combinations pass.
This might be due to the recent bootstrap breakage that has been plaguing us this week. I tracked it down to libgcc being mis-compiled. I'll try compiling libgcc with -O1 (and with load PRE turned on) and seeing if that works.
And by "might be due to" I mean "might be the cause of", of course.
This might be due to the recent bootstrap breakage that has been plaguing us this week. I tracked it down to libgcc being mis-compiled. I'll try compiling libgcc with -O1 (and with load PRE turned on) and seeing if that works.
Adding Chris to the CC list because it seems to be an asm problem.
Much reduced testcase It seems to be an asm problem. This testcase gives the right result if compiled with llc, but when optimized with -inline and then compiled with llc the result changes. To get this testcase I extracted each asm into it's own function then optimized what was left with -std-compile-opts. Using -inline puts the asm statements back where they were originally - and triggers the bug.
It might be a codegen bug that requires variables to be in registers to fire. This would explain why both mem2reg and sroa seem to trigger it.
Possible. SROA is calling a mem2reg pass on the code. I narrowed it down to that call causing the failure. Something in __umoddi3 is getting miscompiled.
It might be a codegen bug that requires variables to be in registers to fire. This would explain why both mem2reg and sroa seem to trigger it.
Okay. It's the SROA pass. If I remove that from the list, then it passes. If you run it alone on the file, it fails. Though it still looks as if the mem2reg causes problems (if I remove SROA, but keep mem2reg, then it fails).
I was able to reproduce it with this:
$ opt -mem2reg < testcase.bc | llc -o t.s -f -mtriple=i386-apple-darwin9.5 $ gcc -o t t.s $ ./t -1
Mem2Reg is causing the problem?!
Bill, do you have sometime to look at this? Thanks.
Testcase .ll Here is a testcase .ll that combines the original code and the code for muldi3. It shows the bug: if compiled as is with llc, the output is -10347 (correct). If compiled with opt -std-compile-opts followed by llc then the output is -1.
Ouch... talk about motivation for end-to-end testing.
It looks like llvm-gcc has a not working __umoddi3 in its libgcc.
If I compile to assembler using llvm-gcc, then assemble and link using gcc, then the problem goes away. So llvm-gcc is producing correct assembler. Investigating...
I can reproduce this on x86-32 linux but not on x86-64 linux.
Can you attach .s file generated directly from llvm-gcc and that produced by llc?
Unfortunately I still get this:
regehr@john-home:~/volatile/tmp71$ llvm-gcc small.c -o small ; ./small -1
regehr@john-home:~/volatile/tmp71$ llvm-gcc -S --emit-llvm small.c -o - | llvm-as | llc -relocation-model=pic -disable-fp-elim -f -o small.s ; gcc small.s -o small ; ./small -10347
Can you try the following llc options? -relocation-model=pic -disable-fp-elim. Thanks.
Also I verified that r60392 still has the bug.
lli gives me the correct answer, as does an executable obtained by running llc on an emitted bytecode file.
Is there something else I can try?
I can't reproduce on x86 / Mac OS X. Can you (or someone on Linux) bugpoint it on a Linux system?
Extended Description
This one is interesting because llvm-gcc's output is wrong at all optimization levels (-O0 through -O3).
This is a regression since 2.3.
Seen on r60325 on Ubuntu Hardy on x86.
regehr@john-home:~/volatile/tmp71$ llvm-gcc -O small.c -o small ; ./small -1 regehr@john-home:~/volatile/tmp71$ llvm-gcc-2.4 -O small.c -o small ; ./small -1 regehr@john-home:~/volatile/tmp71$ llvm-gcc-2.3 -O small.c -o small ; ./small -10347 regehr@john-home:~/volatile/tmp71$ llvm-gcc-2.2 -O small.c -o small ; ./small -10347 regehr@john-home:~/volatile/tmp71$ icc -O small.c -o small ; ./small -10347 regehr@john-home:~/volatile/tmp71$ gcc -O small.c -o small ; ./small -10347 regehr@john-home:~/volatile/tmp71$ cat small.c
include
include
static inline int16_t safe_mul_int16_t_s_s (int16_t si1, int16_t si2) {
if (si1 > 0){ / si1 is positive /
if (si2 > 0) { / si1 and si2 are positive / if (si1 > (INT16_MAX / si2)) {
return si1;
}
} / end if si1 and si2 are positive /
else { / si1 positive, si2 non-positive / if (si2 < (INT16_MIN / si1)) {
return si1;
}
} / si1 positive, si2 non-positive /
} / end if si1 is positive /
else { / si1 is non-positive /
if (si2 > 0) { / si1 is non-positive, si2 is positive /
if (si1 < (INT16_MIN / si2)) {
return si1;
}
} / end if si1 is non-positive, si2 is positive / else { / si1 and si2 are non-positive /
if ( (si1 != 0) && (si2 < (INT16_MAX / si1))) { return si1;
}
} / end if si1 and si2 are non-positive / } / end if si1 is non-positive /
return si1 * si2;
}
static inline uint64_t
safe_mod_uint64_t_u_u (uint64_t ui1, uint64_t ui2)
{
if (ui2 == 0) return ui1;
return ui1 % ui2;
}
int32_t g_355 = 0xC5E0286AL; int32_t g_379 = -1L; uint32_t g_450;
int32_t func_3 (uint32_t p_4); int32_t func_3 (uint32_t p_4) { g_450 = safe_mul_int16_t_s_s (p_4, 1); return 1; }
int32_t func_1 (void); int32_t func_1 (void) { func_3 (safe_mod_uint64_t_u_u (g_379, g_355)); return 1; }
int main (void) { func_1 (); printf ("%d\n", g_450); return 0; }