nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
92 stars 46 forks source link

[Tracker] All the issue related with e2e shark test suite #812

Open pdhirajkumarprasad opened 1 month ago

pdhirajkumarprasad commented 1 month ago

Full ONNX FE tracker is at: https://github.com/nod-ai/SHARK-Turbine/issues/564

Running model

  1. For models, registered in e2e, use : python ./run.py -t ModelName
  2. For models from onnx-zoo, see 826

For onnx/models/

critical issues

onnx to torch

# device issue type issue no #model impacted list of model assignee status
1 CPU failed to legalize operation onnx.LSTM 315 11 modleList onnx-model-zoo @renxida
2 CPU failed to legalize operation onnx.Expand 813 47 modelList @vivekkhandelwal1
3 CPU failed to legalize operation onnx.Resize 814 31 modelList onnx-model-zoo @aldesilv
4 CPU failed to legalize operation onnx.If 696 14 modelList onnx-model-zoo @renxida
5 CPU failed to legalize operation onnx.AveragePool 816 1 onnx-model-zoo @vivekkhandelwal1
6 CPU failed to legalize operation onnx.Conv 817 52 onnx-model-zoo @vivekkhandelwal1 IR is working. We need to check on all models
7 CPU failed to legalize operation onnx.Loop 332 1 onnx-model-zoo @PhaneeshB
8 CPU failed to legalize operation onnx.NonMaxSuppression 819 2 onnx-model-zoo @aldesilv
9 CPU failed to legalize operation onnx.NonZero 820 4 onnx-model-zoo @renxida
10 CPU failed to legalize operation onnx.Pad 821 4 onnx-model-zoo @Shukla-Gaurav
11 CPU failed to legalize operation onnx.Transpose 822 onnx-model-zoo @zjgarvey
12 CPU failed to legalize operation onnx.ScatterElements (reduction : add) 823 14 onnx-model-zoo @AmosLewis
13 CPU failed to legalize unresolved materialization 826 5 onnx-model-zoo @jinchen62 Merged
14 CPU boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp 3293 @renxida
15 CPU Add TorchToLinalg lowering for MaxUnpool operation 718 @jinchen62 Reviewing
16 CPU Fix Onnx.DFT Torch->Linalg lowering 800 @PhaneeshB
17 CPU failed to legalize operation 'torch.aten.squeeze.dim' 846 @jinchen62 Reviewing

torch to linalg

# device issue type issue no #model impacted list of model assignee
1 CPU 'linalg.conv_2d_nchw_fchw' op inferred input/output operand 824 62 modelList onnx-model-zoo @vivekkhandelwal1
2 CPU 'linalg.generic' op inferred input/output operand 825 10 modelList onnx-model-zoo @zjgarvey
3 CPU Create tensor.expandshape instead of tensor.reshape when possible 3647 @zjgarvey

iree-compile

# device issue type issue no #model impacted list of model assignee Status
1 CPU 'stream.async.dispatch' op has invalid Read access range 18468 55 modelList onnx-model-zoo
2 CPU failed to legalize operation 'hal.interface.constant.load' 18487 2 Natural_Language_Processing/funnelbase_Opset16_transformers Natural_Language_Processing/funnelbase_Opset16_transformers @jinchen62
3 CPU Failed to legalize operation 'arith.fptoui' that was explicitly marked illegal 18501 2 Natural_Language_Processing/funnel_Opset16_transformers/funnel_Opset16_transformers.onnx Natural_Language_Processing/funnel_Opset17_transformers/funnel_Opset17_transformers.onnx
4 GPU error: 'vector.transfer_read' op Anchoring on transfer_read with unsupported number of elements 18601 100+
5 GPU error: Vector shape: [1, 8, 24] does not match the layout 18602 100+
6 GPU func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes 18603 100+

numerics

# device issue type issue no #model impacted list of model assignee
1 CPU numeric need_to_analyze 101 modleList
2 [numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops 2 18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp Issue no 797 Ops not in model

zjgarvey commented 1 month ago

Can you update the model List links?

jinchen62 commented 1 month ago

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including https://github.com/nod-ai/SHARK-Turbine/issues/801 right?

pdhirajkumarprasad commented 1 month ago

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

jinchen62 commented 1 month ago

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite https://github.com/llvm/torch-mlir/issues/3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?