Open cdevadas opened 3 weeks ago
llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll (in function local_atomic_fadd_v2bf16_noret)
I don't see that function in that file. Can you provide a link?
Looks like I have a slightly older version of upstream compiler. This test has been changed. However the same test compiled for SelectionDAG still exists and it reproduces the problem. https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll#L364
@llvm/issue-subscribers-backend-amdgpu
Author: Christudasan Devadasan (cdevadas)
I think this is just from unhandled insertion of bitcasts to get the types to match for vectorization:
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 -print-after=load-store-vectorizer < %s
define amdgpu_kernel void @no_vectorize_0(ptr addrspace(3) %ptr, <2 x half> %data) {
%i1 = atomicrmw fadd ptr addrspace(3) %ptr, <2 x half> %data syncscope("agent") seq_cst, align 4
ret void
}
define amdgpu_kernel void @no_vectorize_1(i32 %ptr.as.int, <2 x half> %data) {
%ptr = inttoptr i32 %ptr.as.int to ptr addrspace(3)
%i1 = atomicrmw fadd ptr addrspace(3) %ptr, <2 x half> %data syncscope("agent") seq_cst, align 4
ret void
}
define amdgpu_kernel void @does_vectorize(i32 %ptr.as.int, i32 %data.as.int) {
%ptr = inttoptr i32 %ptr.as.int to ptr addrspace(3)
%data = bitcast i32 %data.as.int to <2 x half>
%i1 = atomicrmw fadd ptr addrspace(3) %ptr, <2 x half> %data syncscope("agent") seq_cst, align 4
ret void
}
The LSV pass doesn't combine the loads even though the base address remain the same for them. There are many instances found in the AMDGPU codegen lit test folder. For example, llvm/test/CodeGen/AMDGPU/GlobalISel/fp-atomics-gfx940.ll (in function local_atomic_fadd_v2bf16_noret) should have the two loads combined earlier. But they are merged using the target specific Load Store Optimizer pass (si-load-store-opt) after ISel.