Open bondhugula opened 1 year ago
@llvm/issue-subscribers-mlir
@llvm/issue-subscribers-mlir-gpu
Looks like my attempt to fix the commit message didn't go through, so, closed by 20c66a0c66340f
Thanks - the commit resolves the first test case, but rocm printf.mlir
still fails for me at ed27d28f9a53d689c98a3bef26980e2858350548.
******************** TEST 'MLIR :: Integration/GPU/ROCM/printf.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1'; /home/uday/llvm-project-upstream/build/bin/mlir-opt /home/uday/llvm-project-upstream/mlir/test/Integration/GPU/ROCM/printf.mlir | /home/uday/llvm-project-upstream/build/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-rocdl{index-bitwidth=32 runtime=HIP},gpu-to-hsaco{chip=gfx1100}))' | /home/uday/llvm-project-upstream/build/bin/mlir-opt -gpu-to-llvm | /home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner --shared-libs=/home/uday/llvm-project-upstream/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/uday/llvm-project-upstream/build/lib/libmlir_runner_utils.so --entry-point-result=void | /home/uday/llvm-project-upstream/build/bin/FileCheck /home/uday/llvm-project-upstream/mlir/test/Integration/GPU/ROCM/printf.mlir
--
Exit Code: 2
Command Output (stderr):
--
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner --shared-libs=/home/uday/llvm-project-upstream/build/lib/libmlir_rocm_runtime.so --shared-libs=/home/uday/llvm-project-upstream/build/lib/libmlir_runner_utils.so --entry-point-result=void
#0 0x000055b4228b02c0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner+0x9bf2c0)
#1 0x000055b4228ad6a4 SignalHandler(int) Signals.cpp:0:0
#2 0x00007fd5e4842520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
#3 0x00007fd5e091fb68 (/opt/rocm/lib/libamdhip64.so.5+0x31fb68)
#4 0x00007fd5e094435b (/opt/rocm/lib/libamdhip64.so.5+0x34435b)
#5 0x00007fd5e094cdad (/opt/rocm/lib/libamdhip64.so.5+0x34cdad)
#6 0x00007fd5e094d124 (/opt/rocm/lib/libamdhip64.so.5+0x34d124)
#7 0x00007fd5e0912444 (/opt/rocm/lib/libamdhip64.so.5+0x312444)
#8 0x00007fd5e08055d0 (/opt/rocm/lib/libamdhip64.so.5+0x2055d0)
#9 0x00007fd5e0813369 hipModuleLaunchKernel (/opt/rocm/lib/libamdhip64.so.5+0x213369)
#10 0x00007fd5e4ade874 mgpuLaunchKernel (/home/uday/llvm-project-upstream/build/lib/libmlir_rocm_runtime.so+0x39874)
#11 0x00007fd5e502e08e
#12 0x00007fd5e502e0e1
#13 0x000055b422e9bd8c compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#14 0x000055b422e9c437 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, std::unique_ptr<llvm::TargetMachine, std::default_delete<llvm::TargetMachine>>) JitRunner.cpp:0:0
#15 0x000055b422e9a1b5 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) (/home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner+0xfa91b5)
#16 0x000055b4227fe273 main (/home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner+0x90d273)
#17 0x00007fd5e4829d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#18 0x00007fd5e4829e40 call_init ./csu/../csu/libc-start.c:128:20
#19 0x00007fd5e4829e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#20 0x000055b42288fa25 _start (/home/uday/llvm-project-upstream/build/bin/mlir-cpu-runner+0x99ea25)
FileCheck error: '<stdin>' is empty.
FileCheck command line: /home/uday/llvm-project-upstream/build/bin/FileCheck /home/uday/llvm-project-upstream/mlir/test/Integration/GPU/ROCM/printf.mlir
--
I bumped into the printf.mlir problem a little while ago on our buildbot, and officially it's fixed with ROCm 5.6.0, which should be available soon. When I started using a 5.6.0 pre-release with the buildbot, the test stopped failing.
I'll verify that that also fixes it on your card (buildbot's is older) and find out if pre-release availability is a thing.
I haven't checked on the new card yet, but I do have a workaround, which is to do export LIT_XFAIL='Integration/GPU/ROCM/printf.mlir'
before running check-mlir. That'll list it as an expected failure. When you install a version of ROCm that fixes the problem, it'll become an "unexpected pass" and signal to get your attention.
@bondhugula , I tested on a card like yours, and indeed it's the same bug. The upcoming ROCm 5.6 will fix it, or the LIT_XFAIL suggestion above will work around it.
@bondhugula , ROCm 5.6 is out. See https://rocm.docs.amd.com/en/latest/.
@bondhugula , ROCm 5.7 is out.
The latest official Git version at 18cc07aa07f6784cc59a4b4cfe33522867805586 (Jun 8) has two ROCM tests in the check-mlir suite failing on a modern AMD Radeon GPU - the RX 7900 XTX (gfx1100) with ROCM 5.4.3. The remaining four in
Integration/GPU/ROCM/
pass.Tagging the authors based on the ChangeLog here.
CC: @krzysz00 @jerryyin