iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 620 forks source link

iree-run-module: Not generating output. exiting silently #18829

Open pdhirajkumarprasad opened 1 month ago

pdhirajkumarprasad commented 1 month ago

What happened?

for the given IR

#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
#map1 = affine_map<(d0, d1, d2, d3) -> (0, d1, d2, d3)>
module {
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  func.func @torch_jit(%arg0: tensor<1x3x224x224xf32>, %arg1: tensor<?x?x?x?xf32>, %arg2: tensor<?x?x?x?xf32>) -> tensor<1x3x224x224xf32> {
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c3 = arith.constant 3 : index
    %c224 = arith.constant 224 : index
    %dim = tensor.dim %arg2, %c1 : tensor<?x?x?x?xf32>
    %dim_0 = tensor.dim %arg2, %c2 : tensor<?x?x?x?xf32>
    %dim_1 = tensor.dim %arg2, %c3 : tensor<?x?x?x?xf32>
    %0 = arith.cmpi eq, %dim, %c3 : index
    cf.assert %0, "mismatched size for broadcast"
    %1 = arith.cmpi eq, %dim_0, %c224 : index
    cf.assert %1, "mismatched size for broadcast"
    %2 = arith.cmpi eq, %dim_1, %c224 : index
    cf.assert %2, "mismatched size for broadcast"
    %3 = tensor.empty() : tensor<1x3x224x224xf32>
    %4 = linalg.generic {indexing_maps = [#map, #map1, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2, %arg0 : tensor<?x?x?x?xf32>, tensor<1x3x224x224xf32>) outs(%3 : tensor<1x3x224x224xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %5 = arith.mulf %in, %in_2 : f32
      linalg.yield %5 : f32
    } -> tensor<1x3x224x224xf32>
    return %4 : tensor<1x3x224x224xf32>
  }
}

iree-run-module exiting silently without generating output

Steps to reproduce your issue

command:

iree-compile model.modified.mlir --iree-hal-target-backends=llvm-cpu -o compiled_model.vmfb
iree-run-module --module='compiled_model.vmfb' --device=local-task --function='torch_jit' --input='1x3x224x224xf32=@input.0.bin' --output=@'output.0.bin'  --input='1x3x224x224xf32=@input.0.bin' --input='1x3x224x224xf32=@input.0.bin' 

input.0.bin.txt

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

ScottTodd commented 1 month ago

Can you try with a debugger attached? Sometimes the tools crash in ways that don't produce messages (running through layers of Python console scripts also doesn't help)

IanWood1 commented 1 month ago

Do you know which commit this was having problems with? I tried with fecccdc852a7e2dda30cbb924427634acb7eb820 and didn't have an issue

Edit: it also worked on TOM (currently 2dffc9e43fba24e720a122d2ae85d1ce14ed8d36)

ScottTodd commented 1 month ago

Do you know which commit this was having problems with? I tried with fecccdc and didn't have an issue

+1, please always include version information. It's there in the issue template for a reason ;)

pdhirajkumarprasad commented 1 month ago

@IanWood1 I am using

IREE (https://iree.dev): IREE compiler version 20241018.1051 @ df5e5aab044ed5b6c5860b0b291c95eafe1c2522 LLVM version 20.0.0git Optimized build

IanWood1 commented 1 month ago

@pdhirajkumarprasad I'm not sure whats going on here. I tried with https://github.com/iree-org/iree/commit/df5e5aab044ed5b6c5860b0b291c95eafe1c2522 but I was still was able to generate output.0.bin

pdhirajkumarprasad commented 4 weeks ago

@IanWood1 even with 'IREE compiler version 20241024.1057 @ 9c5b57a8b9e6981e300df02c41a296bd49e07c99' I don't see output getting generated. Are you executing the following command only?

iree-compile model.modified.mlir --iree-hal-target-backends=llvm-cpu -o compiled_model.vmfb iree-run-module --module='compiled_model.vmfb' --device=local-task --function='torch_jit' --input='1x3x224x224xf32=@input.0.bin' --output=@'output.0.bin' --input='1x3x224x224xf32=@input.0.bin' --input='1x3x224x224xf32=@input.0.bin'

benvanik commented 4 weeks ago

I don't know what all the single quotes are for but they look wrong - "@'output.0.bin" is definitely not what you want. try removing them all or check for a file named 'output.0.bin

pdhirajkumarprasad commented 3 weeks ago

@benvanik even with command below, I still see output.0.bin not getting generated

iree-run-module --module=compiled_model.vmfb --device=local-task --function=torch_jit --input=1x3x224x224xf32=@input.0.bin --input=1x3x224x224xf32=@input.0.bin --input=1x3x224x224xf32=@input.0.bin --output=@output.0.bin
IanWood1 commented 2 weeks ago

I just talked with @pdhirajkumarprasad, and we weren't able to figure out why he wasn't able to generate output, but I could. Could someone else take a look at this and/or provide some insight into what could be going wrong here? We checked multiple versions (most recently bb542eee65fa0a498963df1f2ee2f205a3dd8bd0) but I was still unable to reproduce the issue. Thanks

IanWood1 commented 2 weeks ago

Apparently iree-run-module is exiting with code 245, but I still haven't been able to repro

benvanik commented 2 weeks ago

Someone will need to attach a debugger and step through.