llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.81k stars 11.45k forks source link

[GlobalIsel][AArch64] crash in ir translator #90202

Open tschuett opened 4 months ago

tschuett commented 4 months ago

It tries to unmerge into vectors and translates p0 into s64.

LLVM IR:

define <4 x ptr> @ptr_add_4_vector_ptr(ptr %vec, <4 x i32> %offset) {
  %ptr_add = getelementptr i8, ptr %vec, <4 x i32> %offset
  ret <4 x ptr> %ptr_add
}

call:

llc -mtriple=aarch64 -global-isel -global-isel-abort=2 -verify-machineinstrs llvm/test/CodeGen/AArch64/gep.ll

Error:

# After IRTranslator
# Machine code for function ptr_add_4_vector_ptr: IsSSA, TracksLiveness
Function Live Ins: $x0, $q0

bb.1 (%ir-block.0):
  liveins: $q0, $x0
  %0:_(p0) = COPY $x0
  %1:_(<4 x s32>) = COPY $q0
  %2:_(<4 x p0>) = G_BUILD_VECTOR %0:_(p0), %0:_(p0), %0:_(p0), %0:_(p0)
  %3:_(<4 x s64>) = G_SEXT %1:_(<4 x s32>)
  %4:_(<4 x p0>) = G_PTR_ADD %2:_, %3:_(<4 x s64>)
  %5:_(<4 x p0>) = COPY %4:_(<4 x p0>)
  %6:_(<2 x s64>), %7:_(<2 x s64>) = G_UNMERGE_VALUES %5:_(<4 x p0>)
  $q0 = COPY %6:_(<2 x s64>)
  $q1 = COPY %7:_(<2 x s64>)
  RET_ReallyLR implicit $q0, implicit $q1

# End machine code for function ptr_add_4_vector_ptr.

*** Bad machine code: G_UNMERGE_VALUES source operand does not match vector destination operands ***
- function:    ptr_add_4_vector_ptr
- basic block: %bb.1  (0x7ff19a0b91c8)
- instruction: %6:_(<2 x s64>), %7:_(<2 x s64>) = G_UNMERGE_VALUES %5:_(<4 x p0>)
llvmbot commented 4 months ago

@llvm/issue-subscribers-backend-aarch64

Author: Thorsten Schütt (tschuett)

It tries to unmerge into vectors and translates p0 into s64. LLVM IR: ``` define <4 x ptr> @ptr_add_4_vector_ptr(ptr %vec, <4 x i32> %offset) { %ptr_add = getelementptr i8, ptr %vec, <4 x i32> %offset ret <4 x ptr> %ptr_add } ``` call: ``` llc -mtriple=aarch64 -global-isel -global-isel-abort=2 -verify-machineinstrs llvm/test/CodeGen/AArch64/gep.ll ``` Error: ``` # After IRTranslator # Machine code for function ptr_add_4_vector_ptr: IsSSA, TracksLiveness Function Live Ins: $x0, $q0 bb.1 (%ir-block.0): liveins: $q0, $x0 %0:_(p0) = COPY $x0 %1:_(<4 x s32>) = COPY $q0 %2:_(<4 x p0>) = G_BUILD_VECTOR %0:_(p0), %0:_(p0), %0:_(p0), %0:_(p0) %3:_(<4 x s64>) = G_SEXT %1:_(<4 x s32>) %4:_(<4 x p0>) = G_PTR_ADD %2:_, %3:_(<4 x s64>) %5:_(<4 x p0>) = COPY %4:_(<4 x p0>) %6:_(<2 x s64>), %7:_(<2 x s64>) = G_UNMERGE_VALUES %5:_(<4 x p0>) $q0 = COPY %6:_(<2 x s64>) $q1 = COPY %7:_(<2 x s64>) RET_ReallyLR implicit $q0, implicit $q1 # End machine code for function ptr_add_4_vector_ptr. *** Bad machine code: G_UNMERGE_VALUES source operand does not match vector destination operands *** - function: ptr_add_4_vector_ptr - basic block: %bb.1 (0x7ff19a0b91c8) - instruction: %6:_(<2 x s64>), %7:_(<2 x s64>) = G_UNMERGE_VALUES %5:_(<4 x p0>) ```
tschuett commented 4 months ago

It tries to unmerge a vector into smaller vectors:

%6:_(<2 x s64>), %7:_(<2 x s64>) = G_UNMERGE_VALUES %5:_(<4 x p0>)

It should have used G_EXTRACT_VECTOR_ELT to build two G_BUILD_VECTOR s.

tschuett commented 4 months ago

In AArch64Lowering there seems to be no buildUnmerge invocation. The generic CallLowering has many buildUnmerge calls.

Maybe, this a target independent issue?!?.

tschuett commented 4 months ago

The real issue seems to s64 vs p0.

v01dXYZ commented 4 months ago

Just asking what is s64 and p0. Do you have a pointer to the ref defining those types ? Are there types equivalent for size_t and void* ?

tschuett commented 4 months ago

See https://github.com/llvm/llvm-project/blob/7b5b5214a6f905be67e3c2ecb9b4887eaa3406c3/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp#L46.

They are instances of LLT. s64 is a scalar with 64 bit. P0 is a pointer in address space 0.

tschuett commented 4 months ago

They are both instances of LLT with 64 bit, but they are not the same. The G_UNMERGE_VALUES fails because the input are P0s and the output are s64s.

tschuett commented 3 months ago

I put asserts into buildUnmerge. Assuming I got it right, this might be a stack trace of the issue:

#7 0x000000010514500c llvm::MachineIRBuilder::buildUnmerge(llvm::ArrayRef<llvm::Register>, llvm::SrcOp const&)
#8 0x0000000105051260 llvm::CallLowering::handleAssignments(llvm::CallLowering::ValueHandler&, llvm::SmallVectorImpl<llvm::CallLowering::ArgInfo>&, llvm::CCState&, llvm::SmallVectorImpl<llvm::CCValAssign>&, llvm::MachineIRBuilder&, llvm::ArrayRef<llvm::Register>) const
#9 0x000000010504ffb6 llvm::CallLowering::determineAndHandleAssignments(llvm::CallLowering::ValueHandler&, llvm::CallLowering::ValueAssigner&, llvm::SmallVectorImpl<llvm::CallLowering::ArgInfo>&, llvm::MachineIRBuilder&, unsigned int, bool, llvm::ArrayRef<llvm::Register>) const
#10 0x0000000102871b02 llvm::AArch64CallLowering::lowerReturn(llvm::MachineIRBuilder&, llvm::Value const*, llvm::ArrayRef<llvm::Register>, llvm::FunctionLoweringInfo&, llvm::Register) const
#11 0x00000001050a4cb6 llvm::IRTranslator::translateRet(llvm::User const&, llvm::MachineIRBuilder&)
tschuett commented 3 months ago

handleAssignments calls buildCopyToRegs, which is static. The latter calls getGCDType, which does not distinguish between scalars and pointers.

and there are B.buildUnmerge calls in buildCopyToRegs

tschuett commented 3 months ago

I may be wrong, but I believe that this is the buggy code.

https://github.com/llvm/llvm-project/blob/3676b0945c800e3105f648d178b331953246716a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp#L568

LLT GCDTy = getGCDType(SrcTy, PartTy); if (GCDTy == PartTy) { // If this already evenly divisible, we can create a simple unmerge. B.buildUnmerge(DstRegs, SrcReg); return; }

getGCDType should be fallible when the scalar types don't match: p0 vs. s64.

Do you want to ptrtoint or assign pointer vectors to registers?

@arsenm @aemerson @davemgreen

tschuett commented 3 months ago

ptrtoint might be not enough. What happens with 256bit funny float vectors that you want to return?

arsenm commented 3 months ago

getGCDType should be fallible when the scalar types don't match: p0 vs. s64.

Do you mean do match? Part of the point of this was to handle unmerging pointers to smaller scalars

tschuett commented 3 months ago

Or an alternative path where the scalar types don't match.

if (SrcTy.isVector() && PartTy.isVector() && GCD && SrcTy.getScalarType() != PartTy.getScalarType()) {
  // bicast + unmerge
}
tschuett commented 3 months ago

Part of the point of this was to handle unmerging pointers to smaller scalars

This is exactly the error that we get. We cannot unmerge pointers into scalars.

tschuett commented 3 months ago

Once LLT supports floats , there will be more cases:

enum class UnMergeKind {
  False,
  True,
  WithBitcast
};

UnMergeKind isUnmergeable(unsigned NrOfDstRegs, LLT PartTy, LLT SrcTy);
arsenm commented 3 months ago

Part of the point of this was to handle unmerging pointers to smaller scalars

This is exactly the error that we get. We cannot unmerge pointers into scalars.

Yes you can. AMDGPU heavily relies on unmerging pointers into pairs of s32

tschuett commented 3 months ago

Look at the example at the top. The MachineVerifier complained.

arsenm commented 3 months ago

Look at the example at the top. The MachineVerifier complained.

It's an oddity, but we currently support pointer-> scalar, but not vector of pointers -> vector of scalars