iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 622 forks source link

[GPU]: stack frame size (294916) exceeds limit (131056) in function 'torch_jit$async_dispatch_1_softmax_64x4x144x144xf32_dispatch_tensor_store #19180

Open pdhirajkumarprasad opened 1 week ago

pdhirajkumarprasad commented 1 week ago

What happened?

For the given IR

module {
  func.func @torch_jit(%arg1: !torch.vtensor<[64,4,144,144],f32>, %arg2: !torch.vtensor<[1,4,144,144],f32>) -> !torch.vtensor<[64,4,144,144],f32>  attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %1 = torch.operator "onnx.Add"(%arg1, %arg2) : (!torch.vtensor<[64,4,144,144],f32>, !torch.vtensor<[1,4,144,144],f32>) -> !torch.vtensor<[64,4,144,144],f32> 
    %2 = torch.operator "onnx.Softmax"(%1) {torch.onnx.axis = -1 : si64} : (!torch.vtensor<[64,4,144,144],f32>) -> !torch.vtensor<[64,4,144,144],f32> 
    return %2 : !torch.vtensor<[64,4,144,144],f32>
  }
}

getting error as

error: <unknown>:0:0: stack frame size (294916) exceeds limit (131056) in function 'torch_jit$async_dispatch_1_softmax_64x4x144x144xf32_dispatch_tensor_store'

while it's working fine in CPU.

Steps to reproduce your issue

command:

iree-compile --iree-hal-target-backends=rocm --iree-hip-target=gfx942 -o abc.vmfb model.torch_onnx.mlir

version: IREE compiler version 3.0.0rc20241117 @ 29c451b00ecc9f9e5466e9d1079e0d69147da700

detail log:

dump.log

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

pashu123 commented 4 days ago

@qedawkins I believe this https://github.com/iree-org/iree/pull/19212 patch solves the issue.