SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16

kun-lu20 commented 3 years ago


Bugzilla Link	51030
Version	unspecified
OS	Linux
CC	@River707,@ftynse

Extended Description

Hi,

Recently we're running test suite of TensorFlow v2.5.0 on s390x (Ubuntu 18.04).

Test case //tensorflow/compiler/tests:sort_ops_test_cpu fails due to the following error:

LLVM ERROR: Cannot select: 0x3ff14167ca0: f32 = fp16_to_fp 0x3ff14167f10 0x3ff14167f10: i32,ch = load<(dereferenceable load 2 from %ir.4, !alias.scope !6, !noalias !4), zext from i16> 0x3ff14197548, 0x3ff141678f8, undef:i64 0x3ff141678f8: i64,ch = load<(load 8 from %ir.3)> 0x3ff14197548, 0x3ff14167890, undef:i64 0x3ff14167890: i64 = add nuw 0x3ff141674e8, Constant:i64<8> 0x3ff141674e8: i64,ch = CopyFromReg 0x3ff14197548, Register:i64 %2 0x3ff14167480: i64 = Register %2 0x3ff14167828: i64 = Constant<8> 0x3ff14167758: i64 = undef 0x3ff14167758: i64 = undef In function: compare_lt_WCTTAtafbb4__.7

Other test cases such as //tensorflow/python/keras/optimizer_v2:adam_test and //tensorflow/core/kernels/mlir_generated:abs_cpu_f16_f16_gen_test also fail on s390x due to similar reasons. A related issue (https://github.com/tensorflow/tensorflow/issues/44362) has been raised in TensorFlow GitHub issues.

We think the root cause is lack of support for operations such as FP16_TO_FP and FP_TO_FP16 which perform promotions and truncation for half-precision (16 bit) floating numbers in the SystemZ LLVM backend (llvm/lib/Target/SystemZ/SystemZISelLowering.cpp). Could these features be considered to add to SystemZ LLVM backend? Thanks!

kun-lu20 commented 3 years ago

It looks like commit https://reviews.llvm.org/rG8cd8120a7b5d which has been tagged as 13.0.0-rc could solve this issue, since it adds support for arch14 and operations related to FP16 conversion to the SystemZ backend. Could anyone from community help to confirm this? Thanks!

joker-eph commented 3 years ago

Moving out of MLIR: this is a backend issue.

kun-lu20 commented 3 years ago

Any updates from the community reg this issue? Thanks!

kun-lu20 commented 2 years ago

This issue still persists on TensorFlow v2.8.0 which uses LLVM 15. Looks like specific half-precision (16 bit) operations are still missing in SystemZ LLVM backend.

Can anyone from community take a look at this issue? Thanks very much!

kun-lu20 commented 2 years ago

Recently we've run test cases under //tensorflow/core/kernels/mlir_generated category in TensorFlow v2.9.1 and found that this issue still exists.

Looks like FP16/F16 related operations are still unsupported in LLVM SystemZ backend for most Z cpu models, which causes these test cases (such as abs_cpu_f16_f16_gen_test and sqrt_cpu_f64_f64_gen_test) to fail when applyFullConversion() or applyPartialConversion() function is invoked. Although this commit has added FP16 support in the new arch14 (z16) model, it seems that arch14 still doesn't have full support for FP16 operations.

We also found that when building TensorFlow with options -c opt --copt=-O which sets optimization level to 1 and with JIT_Compilation enabled, these test cases would pass and the output .mlir files could be generated successfully.

We think this could be used as a workaround for now, but to address the root cause, FP16 related operations still need to be added to SystemZ backend.

Any thoughts or suggestions from the community reg this issue would be greatly appreciated. Thanks!

beetrees commented 5 months ago

The following function (compiler explorer):

define half @deref(ptr %p) {
  %x = load half, ptr %p
  ret half %x
}

currently fails to compile when compiling for s390x-unknown-linux-gnu with the following error:

LLVM ERROR: Cannot select: 0x89e9c70: f32,ch = load<(load (s16) from %ir.p), anyext from f16> 0x89a9bc8, 0x89e9c00, undef:i64
  0x89e9c00: i64,ch = CopyFromReg 0x89a9bc8, Register:i64 %0
    0x89e9b90: i64 = Register %0
  0x89e9ce0: i64 = undef
In function: deref
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /opt/compiler-explorer/clang-trunk/bin/llc -o /app/output.s -mtriple=s390x-unknown-linux-gnu <source>
1.  Running pass 'Function Pass Manager' on module '<source>'.
2.  Running pass 'SystemZ DAG->DAG Pattern Instruction Selection' on function '@deref'
 #0 0x00000000037197d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x37197d8)
 #1 0x000000000371714c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007baf41042520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #3 0x00007baf410969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #4 0x00007baf41042476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #5 0x00007baf410287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #6 0x000000000073359e llvm::UniqueStringSaver::save(llvm::StringRef) (.cold) StringSaver.cpp:0:0
 #7 0x00000000034dca44 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34dca44)
 #8 0x00000000034e3e85 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34e3e85)
 #9 0x00000000019ed9de (anonymous namespace)::SystemZDAGToDAGISel::Select(llvm::SDNode*) SystemZISelDAGToDAG.cpp:0:0
#10 0x00000000034d9f94 llvm::SelectionDAGISel::DoInstructionSelection() (/opt/compiler-explorer/clang-trunk/bin/llc+0x34d9f94)
#11 0x00000000034e92a1 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/opt/compiler-explorer/clang-trunk/bin/llc+0x34e92a1)
#12 0x00000000034ebed4 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34ebed4)
#13 0x00000000034edd44 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34edd44)
#14 0x00000000019efe0a (anonymous namespace)::SystemZDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) SystemZISelDAGToDAG.cpp:0:0
#15 0x00000000034dd861 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x34dd861)
#16 0x000000000282216b llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#17 0x0000000002d58b22 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d58b22)
#18 0x0000000002d58ca1 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d58ca1)
#19 0x0000000002d5a950 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-trunk/bin/llc+0x2d5a950)
#20 0x000000000084df94 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x0000000000745af6 main (/opt/compiler-explorer/clang-trunk/bin/llc+0x745af6)
#22 0x00007baf41029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#23 0x00007baf41029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#24 0x0000000000845bce _start (/opt/compiler-explorer/clang-trunk/bin/llc+0x845bce)
Program terminated with signal: SIGSEGV
Compiler returned: 139

Other operations involving half also fail with similar errors.

alexrp commented 4 months ago

FWIW, this is the only remaining blocker I'm aware of for Zig to be able to target s390x:

❯ zig cc s390x.c -target s390x-linux-musl
LLVM ERROR: Cannot select: 0x6d24170: i32 = fp_to_fp16 0x6d23a00
  0x6d23a00: f32,ch = CopyFromReg 0x5e82a10, Register:f32 %10
    0x6c03da0: f32 = Register %10
In function: __fixhfsi

JonPsson1 commented 1 month ago

Patch in progress here: https://github.com/llvm/llvm-project/pull/109164

llvm / llvm-project

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

Extended Description