llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.18k stars 12.04k forks source link

mlir-cpu-runner/async-group.mlir fails and freeze the test suite #58357

Open sylvestre opened 2 years ago

sylvestre commented 2 years ago

log: https://llvm-jenkins.debian.net/job/llvm-toolchain-binaries/architecture=i386,distribution=unstable,label=i386/680/console

Testing: 0  2  4  6  8  10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 
FAIL: MLIR :: mlir-cpu-runner/async-group.mlir (1614 of 1617)
******************** TEST 'MLIR :: mlir-cpu-runner/async-group.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';     /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-opt /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir -pass-pipeline="async-to-async-runtime,func.func(async-runtime-ref-counting,async-runtime-ref-counting-opt),convert-async-to-llvm,func.func(convert-arith-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts"  | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner                                                           -e main -entry-point-result=void -O0                                    -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so       -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so         -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so    | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
--
Exit Code: 2

Command Output (stderr):
--
free(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so
 #0 0xf0e19e71 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0xf0e1a0f0 PrintStackTraceSignalHandler(void*) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:635:3
 #2 0xf0e17d8c llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:104:20
 #3 0xf0e1a42d SignalHandler(int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:0:3
 #4 0xf7f89570 (linux-gate.so.1+0x570)
 #5 0xf7f89559 (linux-gate.so.1+0x559)
 #6 0xefffcec7 (/lib/i386-linux-gnu/libc.so.6+0x85ec7)
 #7 0xeffadb41 raise (/lib/i386-linux-gnu/libc.so.6+0x36b41)
 #8 0xeff97262 abort (/lib/i386-linux-gnu/libc.so.6+0x20262)
 #9 0xeffefc6c (/lib/i386-linux-gnu/libc.so.6+0x78c6c)
#10 0xf000837d (/lib/i386-linux-gnu/libc.so.6+0x9137d)
#11 0xf0009e53 (/lib/i386-linux-gnu/libc.so.6+0x92e53)
#12 0xf000c802 cfree (/lib/i386-linux-gnu/libc.so.6+0x95802)
#13 0xf0359818 operator delete(void*) (/lib/i386-linux-gnu/libstdc++.so.6+0x88818)
#14 0xead7afda mlir::runtime::AsyncToken::~AsyncToken() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:173:8
#15 0xead7b04c mlir::runtime::(anonymous namespace)::RefCounted::destroy() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:149:41
#16 0xead792c3 mlirAsyncRuntimeDropRef build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:237:1
#17 0xf7f7e0a8 
#18 0xf7f7e4a8 
#19 0x56662543 compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:250:3
#20 0x5665ed72 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:267:10
#21 0x5665d9b0 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:402:23
#22 0x565c3aa0 main build-llvm/tools/clang/stage2-bins/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.
Testing: 0  2  4  6  8  10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 
FAIL: MLIR :: mlir-cpu-runner/async-group.mlir (1614 of 1617)
******************** TEST 'MLIR :: mlir-cpu-runner/async-group.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';     /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-opt /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir -pass-pipeline="async-to-async-runtime,func.func(async-runtime-ref-counting,async-runtime-ref-counting-opt),convert-async-to-llvm,func.func(convert-arith-to-llvm),convert-func-to-llvm,reconcile-unrealized-casts"  | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner                                                           -e main -entry-point-result=void -O0                                    -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so       -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so         -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so    | /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
--
Exit Code: 2

Command Output (stderr):
--
free(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_c_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_runner_utils.so -shared-libs=/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/./lib/libmlir_async_runtime.so
 #0 0xf0e19e71 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0xf0e1a0f0 PrintStackTraceSignalHandler(void*) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:635:3
 #2 0xf0e17d8c llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:104:20
 #3 0xf0e1a42d SignalHandler(int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:0:3
 #4 0xf7f89570 (linux-gate.so.1+0x570)
 #5 0xf7f89559 (linux-gate.so.1+0x559)
 #6 0xefffcec7 (/lib/i386-linux-gnu/libc.so.6+0x85ec7)
 #7 0xeffadb41 raise (/lib/i386-linux-gnu/libc.so.6+0x36b41)
 #8 0xeff97262 abort (/lib/i386-linux-gnu/libc.so.6+0x20262)
 #9 0xeffefc6c (/lib/i386-linux-gnu/libc.so.6+0x78c6c)
#10 0xf000837d (/lib/i386-linux-gnu/libc.so.6+0x9137d)
#11 0xf0009e53 (/lib/i386-linux-gnu/libc.so.6+0x92e53)
#12 0xf000c802 cfree (/lib/i386-linux-gnu/libc.so.6+0x95802)
#13 0xf0359818 operator delete(void*) (/lib/i386-linux-gnu/libstdc++.so.6+0x88818)
#14 0xead7afda mlir::runtime::AsyncToken::~AsyncToken() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:173:8
#15 0xead7b04c mlir::runtime::(anonymous namespace)::RefCounted::destroy() build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:149:41
#16 0xead792c3 mlirAsyncRuntimeDropRef build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/AsyncRuntime.cpp:237:1
#17 0xf7f7e0a8 
#18 0xf7f7e4a8 
#19 0x56662543 compileAndExecute((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:250:3
#20 0x5665ed72 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::Operation*, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:267:10
#21 0x5665d9b0 mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) build-llvm/tools/clang/stage2-bins/mlir/lib/ExecutionEngine/JitRunner.cpp:402:23
#22 0x565c3aa0 main build-llvm/tools/clang/stage2-bins/mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp:0:10
#23 0xeff983b5 (/lib/i386-linux-gnu/libc.so.6+0x213b5)
#24 0xeff9847f __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x2147f)
#25 0x565c3877 _start (/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner+0x16877)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir
cpp:0:10
#23 0xeff983b5 (/lib/i386-linux-gnu/libc.so.6+0x213b5)
#24 0xeff9847f __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x2147f)
#25 0x565c3877 _start (/build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/mlir-cpu-runner+0x16877)
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/build-llvm/tools/clang/stage2-bins/bin/FileCheck /build/llvm-toolchain-snapshot-16~++20221013100845+25162418c604/mlir/test/mlir-cpu-runner/async-group.mlir

It freezes the execution of the test suite.

(I am not 100% that it is this test causing the failure of the testsuite)

llvmbot commented 2 years ago

@llvm/issue-subscribers-mlir

nikic commented 2 years ago

I've seen this i686 failure as well. We currently don't ship mlir tools in fedora because of these failures (they don't occur when not building tools).

sylvestre commented 2 years ago

I disabled the testsuite on i386 to avoid this

nikic commented 1 year ago

I took a brief look at this, and in the generated MLIR (presumably the done by the convert-memref-to-llvm pass) I already see things like this:

  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @free(!llvm.ptr<i8>)
  llvm.func @aligned_alloc(i64, i64) -> !llvm.ptr<i8>

So it seems like at least this part of MLIR has a hardcoded assumption that it runs on a 64-bit architecture.

nikic commented 1 year ago

Okay, apparently MLIR has a concept of an "index type" that should handle this. The memref dialect does respect the index type, e.g. here: https://github.com/llvm/llvm-project/blob/de8e0a439777014d7d85007c379579e58bba2efe/mlir/lib/Conversion/MemRefToLLVM/AllocLikeConversion.cpp#L126

The async dialect hardcodes i64 for all sizes: https://github.com/llvm/llvm-project/blob/de8e0a439777014d7d85007c379579e58bba2efe/mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp#L379-L381

Something I don't get yet is how the index type is determined. It looks like even the malloc created by memref also uses i64 on i686. I'd have expected it to use i32.

nikic commented 1 year ago

Looks like the index type is part of LowerToLLVMOptions and determined either from datalayout or an index bitwidth override.

But how is a generic mlir-opt call that is intended for use with mlir-cpu-runner to know the right option for the target? I don't see any obvious way it could use the host index width -- and even manually passing it in seems like a big hassle, as one would have to pass an indexBitwidth option to a bunch of passes.