llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.84k stars 11.47k forks source link

[AMDGPU] No registers from class available to allocate for R600 / Cannot select for AMDGCN #58210

Open HazyFish opened 1 year ago

HazyFish commented 1 year ago

Description

The following code crashes llc with "LLVM ERROR: no registers from class available" during "Greedy Register Allocator" pass for release build and "unhandled address space" during "GPU Load and Store Vectorizer" for debug build when targeting R600.

The code also crashes llc with "LLVM ERROR: Cannot select: t13: ch = store<(store (s8) into i1* poison), trunc to i8> t0, t15, undef:i64, undef:i64" when targeting AMDGCN, but compiles successfully for other architectures.

Reproduction

https://godbolt.org/z/5rv3f6r8r

Code

define void @f() {
BB:
  br label %BB1

BB1:                                              ; preds = %BB1, %BB
  %A1 = alloca <32 x i64>
  %S = shufflevector <32 x i64> zeroinitializer, <32 x i64> <i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1>, <32 x i32> undef
  store <32 x i64> %S, <32 x i64>* %A1
  br i1 false, label %BB1, label %BB2

BB2:                                              ; preds = %BB5, %BB2, %BB1
  %A = alloca i1
  %L = load i1, i1* %A 
  %B1 = add i1 %L, %L
  %B = ashr i1 false, true
  %B3 = or i1 %B, %B1
  %G1 = getelementptr i1, i1* %A, i1 %B
  store i1 %B3, i1* %G1
  br i1 %B, label %BB2, label %BB4

BB4:                                              ; preds = %BB4, %BB2
  %G = getelementptr i1, i1* %A, i1 %B
  %L1 = load i1, i1* %G
  %L2 = load i1, i1* %G
  %B4 = add i1 %L2, true
  %C1 = icmp sgt i1 %B4, %L2
  store i1 %C1, i1* %G
  %B2 = ashr i1 %B4, %L2
  %B5 = xor i1 %B2, %L1
  %C2 = icmp sgt i1 %C1, %B5
  %C4 = icmp eq i1 %C1, %L2
  %C3 = icmp ult i1 %C2, %C4
  br i1 %C3, label %BB4, label %BB5

BB5:                                              ; preds = %BB4
  store i1 %B, i1* %G
  %C = icmp sgt i64 0, -1
  br i1 %C, label %BB2, label %BB3

BB3:                                              ; preds = %BB5
  ret void
}

Stack Trace

Release Build targeting R600

LLVM ERROR: no registers from class available to allocate
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ./llvm-project/build-release/bin/llc crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll -mtriple=r600
1.  Running pass 'Function Pass Manager' on module 'crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll'.
2.  Running pass 'Greedy Register Allocator' on function '@f'
 #0 0x0000000001e217f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./llvm-project/build-release/bin/llc+0x1e217f3)
 #1 0x0000000001e1f70e llvm::sys::RunSignalHandlers() (./llvm-project/build-release/bin/llc+0x1e1f70e)
 #2 0x0000000001e21b7f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f3de317e980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #4 0x00007f3de206ee87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007f3de20707f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0
 #6 0x0000000001d9b230 llvm::report_fatal_error(llvm::Twine const&, bool) (./llvm-project/build-release/bin/llc+0x1d9b230)
 #7 0x0000000001d9b046 (./llvm-project/build-release/bin/llc+0x1d9b046)
 #8 0x00000000016a1ca5 llvm::RegAllocBase::allocatePhysRegs() (./llvm-project/build-release/bin/llc+0x16a1ca5)
 #9 0x0000000001541e4a llvm::RAGreedy::runOnMachineFunction(llvm::MachineFunction&) (./llvm-project/build-release/bin/llc+0x1541e4a)
#10 0x00000000013f5149 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (./llvm-project/build-release/bin/llc+0x13f5149)
#11 0x00000000017b349d llvm::FPPassManager::runOnFunction(llvm::Function&) (./llvm-project/build-release/bin/llc+0x17b349d)
#12 0x00000000017ba973 llvm::FPPassManager::runOnModule(llvm::Module&) (./llvm-project/build-release/bin/llc+0x17ba973)
#13 0x00000000017b4070 llvm::legacy::PassManagerImpl::run(llvm::Module&) (./llvm-project/build-release/bin/llc+0x17b4070)
#14 0x00000000006b3524 main (./llvm-project/build-release/bin/llc+0x6b3524)
#15 0x00007f3de2051c87 __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:344:0
#16 0x00000000006ae0aa _start (./llvm-project/build-release/bin/llc+0x6ae0aa)

Debug Build targeting R600

unhandled address space
UNREACHABLE executed at /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp:61!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ./llvm-project/build-debug/bin/llc crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll -mtriple=r600
1.  Running pass 'Function Pass Manager' on module 'crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll'.
2.  Running pass 'GPU Load and Store Vectorizer' on function '@f'
 #0 0x0000000003adad2a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x0000000003adaedb PrintStackTraceSignalHandler(void*) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x0000000003ad9526 llvm::sys::RunSignalHandlers() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Signals.cpp:103:5
 #3 0x0000000003adb605 SignalHandler(int) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007fcd080dd980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007fcd06fcde87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007fcd06fcf7f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0
 #7 0x0000000003a017a0 llvm::install_out_of_memory_new_handler() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/ErrorHandling.cpp:193:0
 #8 0x000000000193ba41 llvm::R600TTIImpl::getLoadStoreVecRegBitWidth(unsigned int) const /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Target/AMDGPU/R600TargetTransformInfo.cpp:62:1
 #9 0x000000000192d67f llvm::TargetTransformInfo::Model<llvm::R600TTIImpl>::getLoadStoreVecRegBitWidth(unsigned int) const /home/henry/aflplusplus-isel/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2481:5
#10 0x000000000239bd67 llvm::TargetTransformInfo::getLoadStoreVecRegBitWidth(unsigned int) const /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:1088:3
#11 0x0000000003d0a8a6 (anonymous namespace)::Vectorizer::collectInstructions(llvm::BasicBlock*) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:886:16
#12 0x0000000003d0a142 (anonymous namespace)::Vectorizer::run() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:286:5
#13 0x0000000003d0a437 (anonymous namespace)::LoadStoreVectorizerLegacyPass::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:251:3
#14 0x0000000002f797d6 llvm::FPPassManager::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1430:23
#15 0x0000000002f7e602 llvm::FPPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1476:16
#16 0x0000000002f7a0a9 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1545:23
#17 0x0000000002f79c1d llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:535:16
#18 0x0000000002f7e8e1 llvm::legacy::PassManager::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1672:3
#19 0x0000000000d2cdbc compileModule(char**, llvm::LLVMContext&) /home/henry/aflplusplus-isel/llvm-project/llvm/tools/llc/llc.cpp:737:41
#20 0x0000000000d2b162 main /home/henry/aflplusplus-isel/llvm-project/llvm/tools/llc/llc.cpp:418:13
#21 0x00007fcd06fb0c87 __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:344:0
#22 0x0000000000d2a96a _start (./llvm-project/build-debug/bin/llc+0xd2a96a)

Debug Build targeting AMDGCN

LLVM ERROR: Cannot select: t13: ch = store<(store (s8) into `i1* poison`), trunc to i8> t0, t15, undef:i64, undef:i64
  t15: i32 = zero_extend t2
    t2: i1,ch = CopyFromReg t0, Register:i1 %2
      t1: i1 = Register %2
  t3: i64 = undef
  t3: i64 = undef
In function: f
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ./llvm-project/build-debug/bin/llc crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll -mtriple=amdgcn
1.  Running pass 'CallGraph Pass Manager' on module 'crash-classification/dagisel-r600/greedy-register-allocator/tracedepth_16__hash_0x-460489b9cc04c4f6/id:004857,sig:11,src:032769+020349,time:371052658,execs:60260845,op:libAFLCustomIRMutator.so,pos:0.ll'.
2.  Running pass 'AMDGPU DAG->DAG Pattern Instruction Selection' on function '@f'
 #0 0x0000000003adad2a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x0000000003adaedb PrintStackTraceSignalHandler(void*) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x0000000003ad9526 llvm::sys::RunSignalHandlers() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Signals.cpp:103:5
 #3 0x0000000003adb605 SignalHandler(int) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f37688a7980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f3767797e87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f37677997f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0
 #7 0x0000000003a014b4 llvm::report_fatal_error(llvm::Twine const&, bool) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Support/ErrorHandling.cpp:125:5
 #8 0x000000000384fddb /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3777:3
 #9 0x000000000384d372 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3679:9
#10 0x0000000001b73ea9 AMDGPUDAGToDAGISel::SelectCode(llvm::SDNode*) /home/henry/aflplusplus-isel/llvm-project/build-debug/lib/Target/AMDGPU/AMDGPUGenDAGISel.inc:224554:1
#11 0x0000000001b62d83 AMDGPUDAGToDAGISel::Select(llvm::SDNode*) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:524:5
#12 0x0000000003840ec9 llvm::SelectionDAGISel::DoInstructionSelection() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1165:5
#13 0x000000000383ff1a llvm::SelectionDAGISel::CodeGenAndEmitDAG() /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:936:3
#14 0x000000000383e8ed llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, bool&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:688:1
#15 0x000000000383e38b llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1603:11
#16 0x000000000383b936 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:467:3
#17 0x0000000001b604ca AMDGPUDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:136:3
#18 0x0000000002894c85 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:91:8
#19 0x0000000002f797d6 llvm::FPPassManager::runOnFunction(llvm::Function&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1430:23
#20 0x000000000211c1cd (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:179:20
#21 0x000000000211bb5e (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:476:10
#22 0x000000000211b4df (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:542:18
#23 0x0000000002f7a0a9 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1545:23
#24 0x0000000002f79c1d llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:535:16
#25 0x0000000002f7e8e1 llvm::legacy::PassManager::run(llvm::Module&) /home/henry/aflplusplus-isel/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1672:3
#26 0x0000000000d2cdbc compileModule(char**, llvm::LLVMContext&) /home/henry/aflplusplus-isel/llvm-project/llvm/tools/llc/llc.cpp:737:41
#27 0x0000000000d2b162 main /home/henry/aflplusplus-isel/llvm-project/llvm/tools/llc/llc.cpp:418:13
#28 0x00007f376777ac87 __libc_start_main /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:344:0
#29 0x0000000000d2a96a _start (./llvm-project/build-debug/bin/llc+0xd2a96a)
llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-amdgpu

gandhi56 commented 1 year ago

Reproduced on r600 with unhandled address space error in debug mode, and on amdgcn with the cannot select store instruction error in SelectionDAG in debug mode.

Compiled successfully on X86.

gandhi56 commented 1 year ago

Following code compiles successfully, with SelectionDAG and GlobalISel:

; RUN: llc -mtriple=amdgcn %s -o - | FileCheck %s

; CHECK: f
define void @f() {
BB:
  br label %BB1

BB1:                                              ; preds = %BB1, %BB
  %A1 = alloca <32 x i64>
  %S = shufflevector <32 x i64> zeroinitializer, <32 x i64> <i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1>, <32 x i32> undef
  store <32 x i64> %S, ptr %A1
  br i1 false, label %BB1, label %BB2

BB2:                                              ; preds = %BB5, %BB2, %BB1
  %A = alloca i1
  %L = load i1, ptr %A 
  %B1 = add i1 %L, %L
  %B = ashr i1 false, true
  %B3 = or i1 %B, %B1
  %G1 = getelementptr i1, ptr %A, i1 %B
  store i1 %B3, ptr %G1
  br i1 %B, label %BB2, label %BB4

BB4:                                              ; preds = %BB4, %BB2
  %G = getelementptr i1, ptr %A, i1 %B
  %L1 = load i1, ptr %G
  %L2 = load i1, ptr %G
  %B4 = add i1 %L2, true
  %C1 = icmp sgt i1 %B4, %L2
  ; store i1 %C1, ptr %G                     <--  Commenting this store instruction  
  %B2 = ashr i1 %B4, %L2
  %B5 = xor i1 %B2, %L1
  %C2 = icmp sgt i1 %C1, %B5
  %C4 = icmp eq i1 %C1, %L2
  %C3 = icmp ult i1 %C2, %C4
  br i1 %C3, label %BB4, label %BB5

BB5:                                              ; preds = %BB4
  store i1 %B, ptr %G
  %C = icmp sgt i64 0, -1
  br i1 %C, label %BB2, label %BB3

BB3:                                              ; preds = %BB5
  ret void
}
arsenm commented 1 year ago

The issue is for r600 not amdgcn

gandhi56 commented 1 year ago

https://godbolt.org/z/cczEErfM6

gandhi56 commented 1 year ago

For the R600 case, I assume it's okay to return 128 for the vectorizing register bit width. That renders code with an infinite loop. Is there a reason why AMDGPUUnifyDivergentExitNodes is not scheduled for R600?

arsenm commented 1 year ago

For the R600 case, I assume it's okay to return 128 for the vectorizing register bit width. That renders code with an infinite loop. Is there a reason why AMDGPUUnifyDivergentExitNodes is not scheduled for R600?

Because it's barely (not really) maintained. Don't really follow the point about the infinite loop. The vector register bit width shouldn't matter for producing functioning code

gandhi56 commented 1 year ago

https://godbolt.org/z/3hjjfsThn

The instruction %B = ashr i1 false, true propagates poison values. These poison values stick around before instruction selector, which selectionDAG fails to to select. I don't think there is anything to be done for this case as the input code is weird to begin with. Thoughts?

arsenm commented 1 year ago

https://godbolt.org/z/3hjjfsThn

The instruction %B = ashr i1 false, true propagates poison values. These poison values stick around before instruction selector, which selectionDAG fails to to select. I don't think there is anything to be done for this case as the input code is weird to begin with. Thoughts?

poison doesn't mean "won't compile" and nothing is weird with that, that part should just work. The flat pointers aren't going to work on r600. Also, the alloca is the wrong address space. The only thing worth fixing here would be the " no registers from class available" if it manifests using address spaces that are supposed to work.

With the address spaces fixed, this is broken for other reasons:

target triple = "r600--"
define void @f() {
BB:
  br label %BB1

BB1:                                              ; preds = %BB1, %BB
  %A1 = alloca <32 x i64>, addrspace(5)
  %S = shufflevector <32 x i64> zeroinitializer, <32 x i64> <i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1, i64 -1>, <32 x i32> undef
  store <32 x i64> %S, ptr addrspace(5) %A1
  br i1 false, label %BB1, label %BB2

BB2:                                              ; preds = %BB5, %BB2, %BB1
  %A = alloca i1, addrspace(5)
  %L = load i1, ptr addrspace(5) %A
  %B1 = add i1 %L, %L
  %B = ashr i1 false, true
  %B3 = or i1 %B, %B1
  %G1 = getelementptr i1, ptr addrspace(5) %A, i1 %B
  store i1 %B3, ptr addrspace(5) %G1
  br i1 %B, label %BB2, label %BB4

BB4:                                              ; preds = %BB4, %BB2
  %G = getelementptr i1, ptr addrspace(5) %A, i1 %B
  %L1 = load i1, ptr addrspace(5) %G
  %L2 = load i1, ptr addrspace(5) %G
  %B4 = add i1 %L2, true
  %C1 = icmp sgt i1 %B4, %L2
  store i1 %C1, ptr addrspace(5) %G
  %B2 = ashr i1 %B4, %L2
  %B5 = xor i1 %B2, %L1
  %C2 = icmp sgt i1 %C1, %B5
  %C4 = icmp eq i1 %C1, %L2
  %C3 = icmp ult i1 %C2, %C4
  br i1 %C3, label %BB4, label %BB5

BB5:                                              ; preds = %BB4
  store i1 %B, ptr addrspace(5) %G
  %C = icmp sgt i64 0, -1
  br i1 %C, label %BB2, label %BB3

BB3:                                              ; preds = %BB5
  ret void
}
; Assertion failed: (!ExitingMBBs.empty() && "Infinite Loop not supported"), function mergeLoop, file R600MachineCFGStructurizer.cpp, line 1010.

Overall this is just demonstrating that r600 was never really completed and barely works.

lorn10 commented 11 months ago

Overall this is just demonstrating that r600 was never really completed and barely works.

Sorry for jumping in. Short question, what means that statement for end-users especially in regard to OpenCL? Could this be interpreted as one of the reasons why clover had much more difficulties on TeraScale compared to GCN hardware?

I think it can be said that at least in the past clover and (older) LLVM worked to some degree also on TeraScale hardware even if clpeak had problems since literately ever. And it looks that despite those deficits in LLVM the r600 backend could be used for even more interesting stuff like this one here: clover: Add LLVM bitcode as IL type.

Perhaps it would make sense to mention the "r600 LLVM limitations and overall incompleteness" at a more official place like in the LLVM AMDGPU backend user guide. :thinking:

And finally, it looks that r600 support in LLVM could be dropped at some point, - simply because there is no more practical use for it. Regarding OpenCL in Mesa there will exist also for TeraScale hardware rusticl as an alternative for "LLVM clover" even if it is somewhat slower than clover. Yeah, and before r600 will be purged please add your "AMDGPU/R600: Special case addrspacecast lowering for null" commit. I hoped it would land in LLVM 17. :wink:

/cc Maybe something from here is of interest for @Triang3l

gandhi56 commented 11 months ago

I tried inserting the AMDGPUUnifyDivergentExitNodes pass to ensure there is an exit block in CFGs that don't already have one due to infinite loops. That does not suffice and that more work is required in the R600MachineCFGStructurizer. Is it worth the effort to fix this issue for an architecture that may hardly be used these days?

lorn10 commented 11 months ago

@gandhi56 Yes, you are most likely right. As far I know the r600 LLVM backend is nowadays rarely used. I am aware regarding Linux only for Mesa clover and probably (partially) also the r600 Radeon Mesa driver. But perhaps some other AMD related stuff is still depending on it? I have no idea. Someone from AMD should bring here more clarity.

And finally, a use of the r600 LLVM backend in the potential TeraScale Vulkan project "Terakan" is because of the above statement highly unlikely. So unlike to RADV Terakan will have almost sure only a NIR>SFN and no NIR>LLVM path. :thinking: