nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

Failed to legalize operation onnx.Expand #813

Closed pdhirajkumarprasad closed 1 month ago

pdhirajkumarprasad commented 2 months ago

In Shark e2e model, we are seeing issue as

Failed to legalize operation onnx.Expand.

module {
  func.func @torch_jit(%arg0: !torch.vtensor<[14],f32>, %arg1: !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %881 = torch.operator "onnx.Expand"(%arg0, %arg1) : (!torch.vtensor<[14],f32>, !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32> 
    return %881: !torch.vtensor<[],f32>
  }
}

command:

torch-mlir-opt -split-input-file -verify-diagnostics -convert-torch-onnx-to-torch --mlir-print-ir-after-all t1.mlir

error:

model.torch_onnx.mlir:3:12: error: failed to legalize operation 'torch.operator' that was explicitly marked illegal
    %881 = torch.operator "onnx.Expand"(%arg0, %arg1) : (!torch.vtensor<[14],f32>, !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32> 
           ^
model.torch_onnx.mlir:3:12: note: see current operation: %0 = "torch.operator"(%arg0, %arg1) <{name = "onnx.Expand"}> : (!torch.vtensor<[14],f32>, !torch.vtensor<[?],si64>) -> !torch.vtensor<[],f32>
kumardeepakamd commented 2 months ago

@kumardeepakamd worked on supporting it (https://github.com/llvm/torch-mlir/commit/29569713f3878226a6c1054a183dc227934dbe69)

zjgarvey commented 1 month ago

This is being blocked because onnx's shape inference cannot seem to propagate constant data through identity ops.

vinayakdsci commented 1 month ago

@zjgarvey We have, through the onnxruntime EP, the following models passing numerics (thus passing the whole e2e pipeline):

All the others fail due to numerical mismatches, but none of them fails at legalization of the Expand op.

zjgarvey commented 1 month ago

I have a fix for this in the test suite. Working on getting it up today. See also #825

pdhirajkumarprasad commented 1 month ago

closing this issue as we don't see this anymore. if we come across any other issues like this, we open new issue