Open pdhirajkumarprasad opened 1 month ago
I was able to solve this error by removing the input sizes and only using the input file, i.e. using --input='@input.0.bin'
instead of --input='1x128xi64=@input.0.bin'
. It seems like GPU doesn't support the input sizes and input file at the same time.
I was able to solve this error by removing the input sizes and only using the input file, i.e. using
--input='@input.0.bin'
instead of--input='1x128xi64=@input.0.bin'
. It seems like GPU doesn't support the input sizes and input file at the same time.
is the output close to cpu atleast in shape? I wonder if without the input shape its not actually taking the input we expect (which may be dynamically shaped) and just producing garbage. Also normally, input sizes are required for .bin files.
OOC why are the repro steps using iree-opt
?
iree-opt -pass-pipeline='builtin.module(func.func(convert-torch-onnx-to-torch))' model.torch_onnx.mlir -o model.torch.mlir
iree-opt -pass-pipeline='builtin.module(torch-lower-to-backend-contract,func.func(torch-scalarize-shapes),torch-shape-refinement-pipeline,torch-backend-to-linalg-on-tensors-backend-pipeline)' model.torch.mlir -o model.modified.mlir
That sort of manual pipeline specification is unsupported. For any user workflows, use iree-compile
and let it handle which pipelines to run.
I was able to solve this error by removing the input sizes and only using the input file, i.e. using
--input='@input.0.bin'
instead of--input='1x128xi64=@input.0.bin'
. It seems like GPU doesn't support the input sizes and input file at the same time.is the output close to cpu atleast in shape? I wonder if without the input shape its not actually taking the input we expect (which may be dynamically shaped) and just producing garbage.
Ah I see, not sure, I was encountering this error with benchmarking Llama on GPU last night which does have some dynamic shaped inputs. But, removing the input shapes from iree-benchmark-module
and only using numpy files as the inputs I was able to run/benchmark without this error.
But, removing the input shapes from
iree-benchmark-module
and only using numpy files as the inputs I was able to run/benchmark without this error.
Numpy files contain shape information (metadata + buffer contents). Binary files do not (just buffer contents). If using numpy, you can (should?) omit the 1x128xi64
. If using binary, you need it (otherwise the runtime doesn't know how to interpret the buffer)
Few things here
%294 = torch.operator "onnx.Mul"(%290, %293) : (!torch.vtensor<[?,2,64,?],f32>, !torch.vtensor<[1],f32>) -> !torch.vtensor<[?,2,64,?],f32>
%295 = torch.operator "onnx.MatMul"(%292, %294) : (!torch.vtensor<[?,2,?,64],f32>, !torch.vtensor<[?,2,64,?],f32>) -> !torch.vtensor<[?,2,?,?],f32>
%296 = torch.operator "onnx.Add"(%295, %100) : (!torch.vtensor<[?,2,?,?],f32>, !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,2,?,?],f32>
%297 = torch.operator "onnx.Softmax"(%296) {torch.onnx.axis = -1 : si64} : (!torch.vtensor<[?,2,?,?],f32>) -> !torch.vtensor<[?,2,?,?],f32>
return %296: !torch.vtensor<[?,2,?,?],f32>
input.0.bin.txt input.1.bin.txt input.2.bin.txt
What happened?
For attached IR, we are seeing error as
while same is passing in CPU with functional correctness. Due to weight, file size is becoming > 25M so uploaded in zip.
Steps to reproduce your issue
1> Download the zip file and unzip with command 'unzip model.torch_onnx.mlir.zip'
command to reproduce the issue on MI300:
This is impacting 600+ models so please treat this as high priority
model.torch_onnx.mlir.zip
What component(s) does this issue relate to?
Runtime
Version information
No response
Additional context
No response