Closed kraj closed 2 years ago
changed the description
changed the description
mentioned in issue llvm/llvm-bugzilla-archive#34675
mentioned in issue llvm/llvm-project#34043
Bug llvm/llvm-bugzilla-archive#34675 has been marked as a duplicate of this bug.
Removing the release directly, created a merge bug, copied hans.
Adding as a blocker to 5.0.1 meta.
thanks, I would like to request to backport this to release_50
Should be resolved by r313916
Yeah, inserting a CopyFromReg is a good idea. Patch up for review here: https://reviews.llvm.org/D38146
I think you get the behavior you want if you just replace the call to getRegister() with a call to getCopyFromReg()?
It would also be easy to clean up after isel by pattern-matching the store instruction, if you want to go that route.
As Eli alluded to, this bug is triggered if store->load forwarding happens after the store vector 0 split optimization and the loaded value feeds an add:
target triple = "aarch64--linux-gnu" define i64 @foo(<2 x i64> %p) { entry: store <2 x i64> zeroinitializer, <2 x i64> %p %p2 = bitcast <2 x i64> %p to i64 %ld = load i64, i64* %p2 %add = add i64 %ld, 1 ret i64 %add }
The store vector 0 split optimization replaces the stored vreg with XZR, then the store->load forwarding replaces the load with XZR, which if the load is used by an add ends up generating the invalid instruction:
add x0, xzr, #​1
The store vector 0 split optimization inserts XZR operands to prevent DAGCombiner::MergeConsecutiveStores from undoing the splitting of vector stores, but in light of this bug that doesn't seem like a good idea. I'm looking into whether this can be achieved through other means or if not if this vector store splitting should just be moved to after ISel.
Taking a look now...
replaceZeroVectorStore is doing something weird with XZR; instead of using a CopyFromReg to read the value of the register, it's using the register itself as an operand to an ISD::STORE. This somehow works in some cases, but it's not really correct, and it eventually bites us here.
Not sure if there's any good way to detect this sort of situation... maybe registers should have a different MVT?
Isolated function demonstrating issue There we go, a .ll file that demonstrates the issue. I probably don't have much more time to dig in right now though.
I'm pretty sure xzr is intended. The sequence
add x1, xzr, #​1 // =1
add x2, xzr, #​1 // =1
bl _ZN2kj1_17HeapArrayDisposer12allocateImplEmmmPFvPvES4_
is marshalling "elementCount" and "capacity" into x1 and x2. There's no reason for those to be based on sp.
It's difficult to get more detailed info on such a large file though. I'll see if I can at least llvm-extract the offending function.
Hi Raj, this is an encoding problem.
Both ZR and SP registers are encoded as 31, and depending on the instructions, they're either zero or stack.
ADDri can operate on the stack pointer, which means XZR is an invalid operand.
ADDrr can operate on the zero register, which means #1 is an invalid operand.
Clang "works" when outputting the object file directly probably because the encoding 31 is also valid for SP, so it emits that. But from your large file, I can't tell which one would be the correct encoding. It's also possible that none are (sp+#1 or zr+x?) and this got there from a whole different pattern.
It'd be good to have a reduced case, and possibly a bisection.
cheers, --renato
Extended Description
I have a testcase https://uclibc.org/~kraj/a.cpp
Which fails to compile/assemble when using -no-integrated-as on aarch64, but it works ok when using -integrated-as
clang -target aarch64-linux-gnu a.cpp -O2 -c -std=gnu++11 works
but
I then created .s file output with and without integrated-as and in both cases I do see above instructions in assmebly file.
I thought it might be a problem only with GNU as so I created a small test case like this
and now when I compile it, I get errors with both GNU and internal asm
So it does seem that instruction is not accepted by both assemblers but when using internal assembler and not generating .s files these instructions might be getting optimized out ?
I think some aarch64 expertise help is needed.