aarch64 codegen emits add x28,xzr,#1 not accepted by gnu assembler

kraj commented 7 years ago


Bugzilla Link	34674
Resolution	FIXED
Resolved on	Sep 25, 2017 06:53
Version	5.0
OS	All
Blocks	llvm/llvm-project#34043
CC	@rovka,@efriedma-quic,@kbeyls,@MatzeB,@rengolin,@TNorthover

Extended Description

I have a testcase https://uclibc.org/~kraj/a.cpp

Which fails to compile/assemble when using -no-integrated-as on aarch64, but it works ok when using -integrated-as

clang -target aarch64-linux-gnu a.cpp -O2 -c -std=gnu++11 works

but

clang -target aarch64-linux-gnu a.cpp  -O2 -c -std=gnu++11 -no-integrated-as throws

/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25688: Error: integer register expected in the extended/shifted operand register at operand 3 -- `add x28,xzr,#1'
/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25717: Error: integer register expected in the extended/shifted operand register at operand 3 -- `add x1,xzr,#1'
/tmp/tmp.ZfaD5q6lwY/a-8fa1ba.s:25718: Error: integer register expected in the extended/shifted operand register at operand 3 -- `add x2,xzr,#1'

I then created .s file output with and without integrated-as and in both cases I do see above instructions in assmebly file.

I thought it might be a problem only with GNU as so I created a small test case like this

int foo() {
asm("add x28,xzr,#1") ;
return 0;
}

and now when I compile it, I get errors with both GNU and internal asm

So it does seem that instruction is not accepted by both assemblers but when using internal assembler and not generating .s files these instructions might be getting optimized out ?

I think some aarch64 expertise help is needed.

llvmbot commented 2 years ago

changed the description

llvmbot commented 2 years ago

changed the description

aemerson commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#34675

rengolin commented 2 years ago

mentioned in issue llvm/llvm-project#34043

aemerson commented 6 years ago

Bug llvm/llvm-bugzilla-archive#34675 has been marked as a duplicate of this bug.

rengolin commented 6 years ago

Removing the release directly, created a merge bug, copied hans.

rengolin commented 6 years ago

Adding as a blocker to 5.0.1 meta.

kraj commented 6 years ago

thanks, I would like to request to backport this to release_50

llvmbot commented 6 years ago

Should be resolved by r313916

llvmbot commented 7 years ago

Yeah, inserting a CopyFromReg is a good idea. Patch up for review here: https://reviews.llvm.org/D38146

efriedma-quic commented 7 years ago

I think you get the behavior you want if you just replace the call to getRegister() with a call to getCopyFromReg()?

It would also be easy to clean up after isel by pattern-matching the store instruction, if you want to go that route.

llvmbot commented 7 years ago

As Eli alluded to, this bug is triggered if store->load forwarding happens after the store vector 0 split optimization and the loaded value feeds an add:

target triple = "aarch64--linux-gnu" define i64 @foo(<2 x i64> %p) { entry: store <2 x i64> zeroinitializer, <2 x i64> %p %p2 = bitcast <2 x i64> %p to i64 %ld = load i64, i64* %p2 %add = add i64 %ld, 1 ret i64 %add }

The store vector 0 split optimization replaces the stored vreg with XZR, then the store->load forwarding replaces the load with XZR, which if the load is used by an add ends up generating the invalid instruction:

add x0, xzr, #&#8203;1

The store vector 0 split optimization inserts XZR operands to prevent DAGCombiner::MergeConsecutiveStores from undoing the splitting of vector stores, but in light of this bug that doesn't seem like a good idea. I'm looking into whether this can be achieved through other means or if not if this vector store splitting should just be moved to after ISel.

llvmbot commented 7 years ago

Taking a look now...

efriedma-quic commented 7 years ago

replaceZeroVectorStore is doing something weird with XZR; instead of using a CopyFromReg to read the value of the register, it's using the register itself as an operand to an ISD::STORE. This somehow works in some cases, but it's not really correct, and it eventually bites us here.

Not sure if there's any good way to detect this sort of situation... maybe registers should have a different MVT?

See https://reviews.llvm.org/rL286875 .

TNorthover commented 7 years ago

Isolated function demonstrating issue There we go, a .ll file that demonstrates the issue. I probably don't have much more time to dig in right now though.

TNorthover commented 7 years ago

I'm pretty sure xzr is intended. The sequence

    add     x1, xzr, #&#8203;1             // =1
    add     x2, xzr, #&#8203;1             // =1
    bl      _ZN2kj1_17HeapArrayDisposer12allocateImplEmmmPFvPvES4_

is marshalling "elementCount" and "capacity" into x1 and x2. There's no reason for those to be based on sp.

It's difficult to get more detailed info on such a large file though. I'll see if I can at least llvm-extract the offending function.

rengolin commented 7 years ago

Hi Raj, this is an encoding problem.

Both ZR and SP registers are encoded as 31, and depending on the instructions, they're either zero or stack.

ADDri can operate on the stack pointer, which means XZR is an invalid operand.

ADDrr can operate on the zero register, which means #1 is an invalid operand.

Clang "works" when outputting the object file directly probably because the encoding 31 is also valid for SP, so it emits that. But from your large file, I can't tell which one would be the correct encoding. It's also possible that none are (sp+#1 or zr+x?) and this got there from a whole different pattern.

It'd be good to have a reduced case, and possibly a bisection.

cheers, --renato

llvm / llvm-project

aarch64 codegen emits add x28,xzr,#1 not accepted by gnu assembler #34022

Extended Description