Open DimitryAndric opened 10 years ago
As noted on the freebsd-hackers@ mailing list[1], the & modifier may actually be necessary for both 'retry' and 'tmp'.
Quoting from recent gcc documentation[2] (which is really the only reference... :), particularly:
&' constraint with the register constraint ensures that modifying a will not affect what address is referenced by b. Omitting the
&' constraint means that the location of b will be undefined if a is modified before using b.[1] https://lists.freebsd.org/pipermail/freebsd-hackers/2014-October/046282.html [2] https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
If I look at the debug dump during register allocation, nothing in the machine IR tells the register allocator to assign different registers for %1 and %2.
This is the debug dump I see when I compile the IR with "llc -O0 -debug-only=regalloc".
Allocating BB#0: derived from LLVM BB %entry
Live Ins: %RDI
%vreg0
%vreg1
= COPY %vreg0 ; GR64:%vreg1,%vreg0 Regs: RDI=%vreg0* Killing last use: %vreg0 Assigning %vreg1 to %RDI INLINEASM <es:1: rdrand $2 jb 2f dec $0 jne 1b jmp 3f 2: mov $2,$1 3:> [sideeffect] [maystore] [attdialect], $0:[regdef:GR32], %vreg5<def,tied13>, $1:[mem], %vreg1, 1, %noreg, 0, %noreg, $2:[regdef:GR64], %vreg6
, $3:[reguse tiedto:$0], %vreg5 , $4:[clobber], %EFLAGS<earlyclobber,imp-def>, < >; GR32:%vreg5 GR64:%vreg1,%verge ... Assigning %vreg6 to %RDI << INLINEASM <es:1: ...
%vreg1 and %vreg6 are the virtual registers that we should pay attention to in this example.
$1:[mem], %vreg1, 1, %noreg, 0, %noreg
$2:[regdef:GR64], %vreg6
Using the early-clobber modifier "=&q"(tmp) seems to fix the problem, although I'm not sure it's a requirement to use early-clobber in this case.
Extended Description
Depending on the optimization level, clang trunk r219624 seems to sometimes handle output operands incorrectly. This was reported to me by FreeBSD kernel developers, who attempted to compile the following:
int ivy_rng_store(long *buf) { long tmp; int retry;
retry = 10; asm volatile( "1:\n\t" "rdrand %2\n\t" / read randomness into tmp / "jb 2f\n\t" / CF is set on success, exit retry loop / "dec %0\n\t" / otherwise, retry-- / "jne 1b\n\t" / and loop if retries are not exhausted / "jmp 3f\n" / failure, retry is 0, used as return value / "2:\n\t" "mov %2,%1\n\t" / buf = tmp / "3:" : "+q" (retry), "=m" (buf), "=q" (tmp) : : "cc");
return (retry); }
E.g., the intent is that 'tmp' is just used for output, but the actual value is not used outside the inline asm. It is stored to *buf instead.
However, clang -O0 seems to have trouble keeping the two apart, as the resulting assembly is:
ivy_rng_store: # @ivy_rng_store .cfi_startproc
BB#0: # %entry
.Ltmp0: .cfi_def_cfa_offset 16 .Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp2: .cfi_def_cfa_register %rbp movq %rdi, -8(%rbp) movl $10, -20(%rbp) movl -20(%rbp), %eax movq -8(%rbp), %rdi
APP
.Ltmp3: rdrandq %rdi jb .Ltmp4 decl %eax jne .Ltmp3 jmp .Ltmp5 .Ltmp4: movq %rdi, (%rdi) .Ltmp5:
NO_APP
Clearly, the movq %rdi, (%rdi) is incorrect. This seems to be magically solved by enabling optimization, e.g. at -O1 or higher:
ivy_rng_store: # @ivy_rng_store .cfi_startproc
BB#0: # %entry
.Ltmp0: .cfi_def_cfa_offset 16 .Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp .Ltmp2: .cfi_def_cfa_register %rbp movl $10, %eax
APP
.Ltmp3: rdrandq %rcx jb .Ltmp4 decl %eax jne .Ltmp3 jmp .Ltmp5 .Ltmp4: movq %rcx, (%rdi) .Ltmp5:
NO_APP
Something similar happens when targeting i386 at -O0:
ivy_rng_store: # @ivy_rng_store
BB#0: # %entry
.Ltmp0: rdrandl %ecx jb .Ltmp1 decl %eax jne .Ltmp0 jmp .Ltmp2 .Ltmp1: movl %ecx, (%ecx) .Ltmp2:
NO_APP
However, on i386 optimization does not fix it, e.g. at -O1 or higher:
ivy_rng_store: # @ivy_rng_store
BB#0: # %entry
.Ltmp0: rdrandl %ecx jb .Ltmp1 decl %eax jne .Ltmp0 jmp .Ltmp2 .Ltmp1: movl %ecx, (%ecx) .Ltmp2:
NO_APP
Changing the output constraint on 'tmp' to "+q" seems to help on amd64, but on i386 it still produces incorrect output at -O1 or higher optimization.
I tested the above code with different versions of gcc (4.7 through 5.0), but the resulting assembly was always as expected, at any optimization level.