iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

iree.abi.output does not work as expected for complex<f32> #15652

Open okkwon opened 12 months ago

okkwon commented 12 months ago

What happened?

Input module.mlir:

func.func @add_20x20xcomplex64_20x20xcomplex64_20x20xcomplex64(%arg0: tensor<20x20xcomplex<f32>>, %arg1: tensor<20x20xcomplex<f32>>, %arg2: tensor<20x20xcomplex<f32>> {iree.abi.output = 0 : index}) -> tensor<20x20xcomplex<f32>> {
  %0 = stablehlo.add %arg0, %arg1 : tensor<20x20xcomplex<f32>>
  return %0 : tensor<20x20xcomplex<f32>>
}

compile command:

iree-compile --iree-hal-target-backends=llvm-cpu --iree-input-demote-i64-to-i32=false --iree-input-demote-f64-to-f32=false --iree-opt-demote-f64-to-f32=false module.mlir -o module.vmfb

dump: https://gist.github.com/okkwon/a072d670831ba80580481db9aece2184

The same op with a different type, e.g., f16 works as expected. (No stream.resource.alloca.)

dump for f16: https://gist.github.com/okkwon/389e2dda53cc4bd69c0a51b0e53c68a6

One biggest difference is that for complex, there are multiple ops generated, and the ops are not inlined into the top level function with iree.hal.buffer. iree.abi.output is not honored during the private function lowering. For example, IPO removes the operand.

okkwon commented 12 months ago

@benvanik @MaheshRavishankar This is an extended case of iree.abi.output and looks like we need more discussion.