Open tpopp opened 3 weeks ago
@llvm/issue-subscribers-backend-amdgpu
Author: Tres (tpopp)
@petar-avramovic has been working on this.
It's more intentionally failing since it's known wrong at this point
G_PHI in general is known wrong, or just with s1
values? And is this the lowering or the representation that has a problem?
The error here is in regbankselect that is not assigning correct reg banks according to machine uniformity analysis. This will be fixed soon with new reg bank select passes. The G_PHI in question is uniform and should be assigned to sgpr reg bank and lowered to S32 Bank should be assigned like this:
body: |
bb.1..preheader:
successors: %bb.2(0x40000000), %bb.3(0x40000000)
liveins: $sgpr6_sgpr7
%4:sgpr(p4) = COPY $sgpr6_sgpr7
%14:sgpr(s64) = G_CONSTANT i64 8
%15:sgpr(p4) = nuw nusw G_PTR_ADD %4, %14(s64)
%16:sgpr(s32) = G_LOAD %15(p4) :: (dereferenceable invariant load (s32) from %ir.min.iters.check.kernarg.offset.align.down, align 8, addrspace 4)
%28:sgpr(s32) = G_CONSTANT i32 1
%29:sgpr(s32) = G_XOR %16, %28
%32:sgpr(s32) = G_CONSTANT i32 0
%34:sgpr(s32) = G_AND %29, %28
G_BRCOND %34(s32), %bb.3
G_BR %bb.2
bb.2.vector.ph:
successors: %bb.3(0x80000000)
%18:sgpr(s64) = G_LOAD %4(p4) :: (dereferenceable invariant load (s64) from %ir..kernarg.offset1, align 16, addrspace 4)
%27:sgpr(s64) = G_CONSTANT i64 0
%37:vgpr(s64) = COPY %18(s64)
%38:vgpr(s64) = COPY %27(s64)
%35:vcc(s1) = G_ICMP intpred(sgt), %37(s64), %38
%36:sgpr(s32) = G_COPY_SCC_VCC %35(s1)
bb.3.Flow97:
successors: %bb.4(0x40000000), %bb.5(0x40000000)
%39:sgpr(s32) = G_PHI %32(s32), %bb.1, %36(s32), %bb.2
%42:sgpr(s32) = G_CONSTANT i32 1
%43:sgpr(s32) = G_XOR %39, %42
%47:sgpr(s32) = G_AND %43, %42
G_BRCOND %47(s32), %bb.5
G_BR %bb.4
bb.4.scalar.ph.preheader:
S_ENDPGM 0
bb.5.Flow98:
S_ENDPGM 0
G_PHI is known wrong. S1 G_PHIs could be even considered correct. Divergent G_PHIs are lowered/selected to PHI in AMDGPUGlobalISelDivergenceLowering, so S1 G_PHI are known uniform and should be extended to S32 in RegBankSelect. Instruction-select failure is correct, despite the fact it was introduced before the AMDGPUGlobalISelDivergenceLowering. S1 G_PHIs are too complicated to be selected in instruction-select.
It's unclear to me at this time if instruction selection is actually the problem or the previous steps are miscompiling.
Command:
llc -global-isel reduced.ll
Input: