llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.08k stars 11.59k forks source link

MLIR tests are crashing on 32-bit arm linux #46459

Closed rovka closed 1 year ago

rovka commented 4 years ago
Bugzilla Link 47115
Version unspecified
OS Linux
CC @zmodem,@jpienaar,@River707,@ftynse

Extended Description

I'm seeing about 250 MLIR test failures for 11.0.0 rc1. I haven't confirmed but it's possible that it just runs out of memory (the board has 1 GB per core allocated). All failures look something like this:

**** TEST 'MLIR :: Analysis/test-dominance.mlir' FAILED **** Script:

: 'RUN: at line 1'; /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/Phase3/Release/llvmCore-11.0.0-rc1.obj/bin/mlir-opt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/llvm-project/mlir/test/Analysis/test-dominance.mlir -test-print-dominance -split-input-file 2>&1 | /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/Phase3/Release/llvmCore-11.0.0-rc1.obj/bin/FileCheck /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/llvm-project/mlir/test/Analysis/test-dominance.mlir

Exit Code: 1

Command Output (stderr):

/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/llvm-project/mlir/test/Analysis/test-dominance.mlir:3:17: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: Testing : func_condBranch ^

:1:1: note: scanning from here PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. ^ :1:50: note: possible intended match here PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. ^ Input file: Check file: /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/llvm-project/mlir/test/Analysis/test-dominance.mlir -dump-input=help explains the following input dump. Input was: <<<<<< 1: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. label:3'0 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found label:3'1 ? possible intended match 2: Stack dump: label:3'0 ~~~~~~~~~~~ 3: 0. Program arguments: /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/Phase3/Release/llvmCore-11.0.0-rc1.obj/bin/mlir-opt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-tk1_32-build/rc1/llvm-project/mlir/test/Analysis/test-dominance.mlir -test-print-dominance -split-input-file label:3'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>
rovka commented 4 years ago

Right, sorry, I thought I removed it from blockers. I've disabled mlir for rc2 since nobody seems interested in supporting it on armv7 for now.

zmodem commented 4 years ago

I think the lack of 32 bits support is known, River can you confirm? Would be nice to understand what it would take here.

Maybe we shouldn't block the release on this, then?

ftynse commented 4 years ago

After the assertions are resolved, I suspect some tests also hardcoded the equivalent of intptr_t as i64 in FileCheck annotations, which will need to be fixed.

This one should be fixable by specifying alignas(8) on this class

I remember looking for alignas and failing to find it...

River707 commented 4 years ago

Most of the 64-bit specific stuff should be fixable by specifying the expected alignment of the pointer classes. This one should be fixable by specifying alignas(8) on this class https://github.com/llvm/llvm-project/blob/96855125e77044b1a5d3c7f0ae90ea3a5cb035c0/mlir/include/mlir/Support/StorageUniquer.h#L98. I'm pretty sure I had it marked as so at one point, but it likely got lost over the months.

joker-eph commented 4 years ago

I think the lack of 32 bits support is known, River can you confirm? Would be nice to understand what it would take here.

rovka commented 4 years ago

I have tried on AArch32 and there are only 15 test failures in check-mlir: Failed Tests (15): MLIR :: Conversion/GPUToNVVM/gpu-to-nvvm.mlir MLIR :: Conversion/SPIRVToLLVM/control-flow-ops-to-llvm.mlir MLIR :: Conversion/StandardToLLVM/calling-convention.mlir MLIR :: Conversion/StandardToLLVM/convert-dynamic-memref-ops.mlir MLIR :: Conversion/StandardToLLVM/convert-funcs.mlir MLIR :: Conversion/StandardToLLVM/convert-static-memref-ops.mlir MLIR :: Conversion/StandardToLLVM/convert-to-llvmir.mlir MLIR :: Conversion/VectorToLLVM/vector-to-llvm.mlir MLIR :: Dialect/LLVMIR/func.mlir MLIR :: Dialect/LLVMIR/roundtrip.mlir MLIR :: Dialect/Linalg/llvm.mlir MLIR :: Target/llvmir-types.mlir MLIR :: Target/llvmir.mlir MLIR :: mlir-cpu-runner/linalg_integration_test.mlir MLIR :: mlir-cpu-runner/unranked_memref.mlir

mlir-opt: /home/diana.picus/mlir/llvm-project/llvm/include/llvm/ADT/PointerIntPair.h:179: static intptr_t llvm::PointerIntPairInfo<PointerT, IntBits, PtrTraits>::updatePointer(intptr_t, PointerT) [with PointerT = mlir::LLVM::LLVMType; unsigned int IntBits = 1; PtrTraits = llvm::PointerLikeTypeTraits; intptr_t = int]: Assertion `(PtrWord & ~PointerBitMask) == 0 && "Pointer is not sufficiently aligned"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump:

  1. Program arguments: /home/diana.picus/mlir/build/bin/mlir-opt /home/diana.picus/mlir/llvm-project/mlir/test/mlir-cpu-runner/linalg_integration_test.mlir -convert-linalg-to-std -convert-linalg-to-l lvm
    Error: entry point not found
    FileCheck error: '' is empty.
    FileCheck command line: /home/diana.picus/mlir/build/bin/FileCheck /home/diana.picus/mlir/llvm-project/mlir/test/mlir-cpu-runner/linalg_integration_test.mlir
rovka commented 4 years ago

Hi Jacques!

I haven't tried running sequentially yet. I have just run our usual release job, which calls: ./test-release.sh -release 11.0.0 -rc 1 -triple armv7a-linux-gnueabihf -j3 -no-openmp -use-ninja -configure-flags -DLLVM_PARALLEL_LINK_JOBS=2

This takes more than 20 hours to run, so it's not trivial to fiddle with it. I'll try to reproduce without the release script, and maybe on a faster AArch32 machine, and report back. IIRC none of our armv7 bots have MLIR enabled, so this didn't show up before.

jpienaar commented 4 years ago

Could you include the configure and test instructions? I'm assuming for execution you've tried running these sequentially (if considered OOM related). Would there be any way to repro without the board?

rovka commented 4 years ago

assigned to @River707

arsenm commented 1 year ago

Old build issue