iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.53k stars 560 forks source link

`iree-compile` crashed with `softmax` with RISCV soft_fp #15019

Open hcindyl opened 10 months ago

hcindyl commented 10 months ago

What happened?

iree-compile llvm-cpubackend withriscv32cpu andsoft_fpfailed to lowersoftmax` op

iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false -iree-input-type=stablehlo --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-debug-symbols=false --iree-vm-bytecode-module-strip-source-map=true --iree-vm-emit-polyglot-zip=false --iree-llvmcpu-target-triple=riscv32-pc-linux-elf --iree-llvmcpu-target-cpu=generic-rv32 --iree-llvmcpu-target-cpu-features=+m --iree-llvmcpu-target-abi=ilp32 --iree-llvmcpu-link-embedded=false iree/samples/models/mnist.mlir -o /dev/null

See https://gist.github.com/hcindyl/84220b274e5a2c2da809008ce44295c9 to the complete stack.

mnist and mobilenet classifier models are affected.

Steps to reproduce your issue

See the CLI and stack trace above.

What component(s) does this issue relate to?

Compiler

Version information

Issue started at 20230922.653 release

Additional context

No response

hcindyl commented 10 months ago

@MaheshRavishankar and @dcaballe could you please take a look?

hcindyl commented 10 months ago

There are 3 LLVM integration PRs between https://github.com/openxla/iree/commit/fc1ff496494b33302aec45c4a0e7fb68e51f9492 and https://github.com/openxla/iree/commit/735a77e4116cb91e2ad1d3a709ec5cf456b35c30 (the last candidate release, and everything worked fine).

MaheshRavishankar commented 10 months ago

It will be a while before I have time to pick this up. Bit short-staffed with folks on vacation. @dcaballe or @pzread do you guys have cycles to look into it and narrow it down a bit

hcindyl commented 9 months ago

Can reproduce with a single-layer softmax.mlir

func.func @predict(%arg0: tensor<1x10xf32>) -> tensor<1x10xf32> attributes {iree.module.export, iree.reflection = {abi = "sip", abiv = 1 : i32, sip = "I8!S5!k0_0R3!_0"}} {
  %0 = stablehlo.constant dense<0xFF800000> : tensor<f32>
  %1 = stablehlo.constant dense<0.000000e+00> : tensor<f32>
  %2 = stablehlo.reduce(%arg0 init: %0) across dimensions = [1] : (tensor<1x10xf32>, tensor<f32>) -> tensor<1xf32>
    reducer(%arg1: tensor<f32>, %arg2: tensor<f32>)  {
    %9 = stablehlo.maximum %arg1, %arg2 : tensor<f32>
    stablehlo.return %9 : tensor<f32>
  }
  %3 = stablehlo.broadcast_in_dim %2, dims = [0] : (tensor<1xf32>) -> tensor<1x10xf32>
  %4 = stablehlo.subtract %arg0, %3 : tensor<1x10xf32>
  %5 = stablehlo.exponential %4 : tensor<1x10xf32>
  %6 = stablehlo.reduce(%5 init: %1) across dimensions = [1] : (tensor<1x10xf32>, tensor<f32>) -> tensor<1xf32>
    reducer(%arg1: tensor<f32>, %arg2: tensor<f32>)  {
    %9 = stablehlo.add %arg1, %arg2 : tensor<f32>
    stablehlo.return %9 : tensor<f32>
  }
  %7 = stablehlo.broadcast_in_dim %6, dims = [0] : (tensor<1xf32>) -> tensor<1x10xf32>
  %8 = stablehlo.divide %5, %7 : tensor<1x10xf32>
  return %8 : tensor<1x10xf32>
}

and CLI of

iree-compile -iree-input-type=stablehlo --iree-hal-target-backends=llvm-cpu  \
  --iree-llvmcpu-target-triple=riscv32-pc-linux-elf \
  --iree-llvmcpu-target-cpu=generic-rv32 \
  --iree-llvmcpu-target-cpu-features=+m \
  --iree-llvmcpu-target-abi=ilp32 \
  softmax.mlir -o /dev/null
hcindyl commented 9 months ago

Can reproduce the crash with one stablehlo.reduce op

func.func @predict(%arg0: tensor<1x10xf32>) -> tensor<1xf32> {
  %0 = stablehlo.constant dense<0xFF800000> : tensor<f32>
  %1 = stablehlo.reduce(%arg0 init: %0) across dimensions = [1] : (tensor<1x10xf32>, tensor<f32>) -> tensor<1xf32>
    reducer(%arg1: tensor<f32>, %arg2: tensor<f32>)  {
    %2 = stablehlo.maximum %arg1, %arg2 : tensor<f32>
    stablehlo.return %2 : tensor<f32>
  }
  return %1 : tensor<1xf32>
}
iree-compile -iree-input-type=stablehlo \
  --iree-hal-target-backends=llvm-cpu  \
  --iree-llvmcpu-target-triple=riscv32-pc-linux-elf \
  --iree-llvmcpu-target-cpu=generic-rv32 \
  --iree-llvmcpu-target-cpu-features=+m \
  --iree-llvmcpu-target-abi=ilp32 \
  test1.mlir -o /dev/null
LLVM ERROR: Do not know how to soften the result of this operator!
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0.  Program arguments: cache/iree_compiler/install/bin/iree-compile -iree-input-type=stablehlo --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-triple=riscv32-pc-linux-elf --iree-llvmcpu-target-cpu=generic-rv32 --iree-llvmcpu-target-cpu-features=+m --iree-llvmcpu-target-abi=ilp32 test1.mlir -o /dev/null
1.  Running pass 'Function Pass Manager' on module 'predict_dispatch_0'.
2.  Running pass 'RISC-V DAG->DAG Pattern Instruction Selection' on function '@predict_dispatch_0_generic_10_f32'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libIREECompiler.so 0x00007f0cdad66648 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
1  libIREECompiler.so 0x00007f0cdad6418c
2  libc.so.6          0x00007f0cd8e5a510
3  libc.so.6          0x00007f0cd8ea80fc
4  libc.so.6          0x00007f0cd8e5a472 gsignal + 18
5  libc.so.6          0x00007f0cd8e444b2 abort + 211
6  libIREECompiler.so 0x00007f0cdab37e0b
7  libIREECompiler.so 0x00007f0cdacfccd8
8  libIREECompiler.so 0x00007f0cdfc21511
9  libIREECompiler.so 0x00007f0cdfb974d2
10 libIREECompiler.so 0x00007f0cdfb97df0 llvm::SelectionDAG::LegalizeTypes() + 1568
11 libIREECompiler.so 0x00007f0cdfb1cad4 llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 228
12 libIREECompiler.so 0x00007f0cdfb1f5ba llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 4634
13 libIREECompiler.so 0x00007f0cdfb21b06
14 libIREECompiler.so 0x00007f0cdffcf2a0
15 libIREECompiler.so 0x00007f0ce17932f8 llvm::FPPassManager::runOnFunction(llvm::Function&) + 1000
16 libIREECompiler.so 0x00007f0ce179346c llvm::FPPassManager::runOnModule(llvm::Module&) + 44
17 libIREECompiler.so 0x00007f0ce1793d9e llvm::legacy::PassManagerImpl::run(llvm::Module&) + 846
18 libIREECompiler.so 0x00007f0cdbe5fc07
19 libIREECompiler.so 0x00007f0cdbe58aff
20 libIREECompiler.so 0x00007f0cdc0ba141
21 libIREECompiler.so 0x00007f0cdaf6e121 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 1873
22 libIREECompiler.so 0x00007f0cdaf6e761 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 289
23 libIREECompiler.so 0x00007f0cdaf6f923
24 libIREECompiler.so 0x00007f0cdc0bbc4a
25 libIREECompiler.so 0x00007f0cdaf6e121 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 1873
26 libIREECompiler.so 0x00007f0cdaf6e761 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 289
27 libIREECompiler.so 0x00007f0cdaf6d2c7 mlir::detail::OpToOpPassAdaptor::runOnOperationAsyncImpl(bool) + 4007
28 libIREECompiler.so 0x00007f0cdaf6dd33 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 867
29 libIREECompiler.so 0x00007f0cdaf6e761 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 289
30 libIREECompiler.so 0x00007f0cdaf6f189 mlir::PassManager::run(mlir::Operation*) + 1577
31 libIREECompiler.so 0x00007f0cdaccc6c1
32 libIREECompiler.so 0x00007f0cdaf1b1ea
33 libIREECompiler.so 0x00007f0cdaf1ea0b
34 libc.so.6          0x00007f0cd8e456ca
35 libc.so.6          0x00007f0cd8e45785 __libc_start_main + 133
36 iree-compile       0x000000000040108e
Aborted
dcaballe commented 9 months ago

I'll report the issue but could you try adding --iree-llvmcpu-use-fast-min-max-ops in the meantime? :)

dcaballe commented 9 months ago

https://github.com/llvm/llvm-project/issues/70061

hcindyl commented 9 months ago

--iree-llvmcpu-use-fast-min-max-ops

Thanks. The flag works with static library (--iree-llvmcpu-link-embedded=false) that the soft_fp symbols will be resolved in the executable link time.