Closed renxida closed 3 weeks ago
I'm not sure if we want to support this, but if we do I think we need to insert a type conversion from vtensor<[1], si64> to vtensor<[?], si64>. Not sure how to materialize that.
KeypointRCNN_vaiq_int8 has if statements that returns not just different shapes but different ranks. We can't support that and will need something to be done with the model.
Stella mentions a way to deal with similar problems
https://discord.com/channels/973663919757492264/1173330951791706113/1246504997269798945
and @zjgarvey's comment in this morning's meeting got me to put 2 and 2 together. Will try to just remove the small branch of these onnx.If
I vaguely remember the concensus is that this is a model issue and should be fixed by editing the model. Perhaps ask sharktank folks for help?
reproducer extract from coat_mini model The onnx.if op is the last 2 op in the model.
module {
func.func @torch_jit(%6768: !torch.vtensor<[1],si64>, %6765: !torch.vtensor<[1,1,216],f32>) -> !torch.vtensor<[],f32> attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
%6769 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__2134> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6770 = torch.operator "onnx.Equal"(%6768, %6769) : (!torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1],i1>
%6771 = torch.operator "onnx.If"(%6770) : (!torch.vtensor<[1],i1>) -> !torch.vtensor<[],f32> {
%6773 = torch.operator "onnx.Identity"(%6765) : (!torch.vtensor<[1,1,216],f32>) -> !torch.vtensor<[1,1,216],f32>
torch.operator_terminator %6773 : !torch.vtensor<[1,1,216],f32>
}, {
%6773 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__2135> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6774 = torch.operator "onnx.Squeeze"(%6765, %6773) : (!torch.vtensor<[1,1,216],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,216],f32>
torch.operator_terminator %6774 : !torch.vtensor<[1,216],f32>
}
return %6771: !torch.vtensor<[],f32>
}
}
{-#
dialect_resources: {
builtin: {
__2134: "0x080000000100000000000000"
}
}
#-}
`torch-mlir-opt --convert-torch-onnx-to-torch if.onnx.mlir --debug
if.onnx.mlir:5:13: error: 'torch.prim.If' op along control flow edge from Region #0 to parent results: source type #0 '!torch.vtensor<[1,216],f32>' should match input type #0 '!torch.vtensor<[],f32>'
%6771 = torch.operator "onnx.If"(%6770) : (!torch.vtensor<[1],i1>) -> !torch.vtensor<[],f32> {
^
if.onnx.mlir:5:13: note: see current operation:
%4 = "torch.prim.If"(%3) ({
%7 = "torch.vtensor.literal"() <{value = dense_resource<__onnx_constant_not_found_possibly_due_to_being_elided__> : tensor<1xsi64>}> : () -> !torch.vtensor<[1],si64>
%8 = "torch.constant.int"() <{value = 1 : i64}> : () -> !torch.int
%9 = "torch.prim.ListConstruct"(%8) : (!torch.int) -> !torch.list<int>
%10 = "torch.prims.squeeze"(%arg1, %9) : (!torch.vtensor<[1,1,216],f32>, !torch.list<int>) -> !torch.vtensor<[1,216],f32>
"torch.prim.If.yield"(%10) : (!torch.vtensor<[1,216],f32>) -> ()
}, {
%5 = "torch.constant.none"() : () -> !torch.none
%6 = "torch.aten.clone"(%arg1, %5) : (!torch.vtensor<[1,1,216],f32>, !torch.none) -> !torch.vtensor<[1,1,216],f32>
"torch.prim.If.yield"(%6) : (!torch.vtensor<[1,1,216],f32>) -> ()
}) : (!torch.bool) -> !torch.vtensor<[],f32>
I vaguely remember the concensus is that this is a model issue and should be fixed by editing the model. Perhaps ask sharktank folks for help?
we have 6 models https://github.com/pdhirajkumarprasad/SHARK-TestSuite/blob/feature/qa/issue/onnx-to-torch/onnx.if failed. Not sure if it is feasible to edit all the models. will ask tomorrow.
Yeah, we could try adding the models to the basic opt list in the test
suite azure_models.py
On Tue, Oct 22, 2024, 9:58 PM Chi_Liu @.***> wrote:
I vaguely remember the concensus is that this is a model issue and should be fixed by editing the model. Perhaps ask sharktank folks for help?
we have 6 models https://github.com/pdhirajkumarprasad/SHARK-TestSuite/blob/feature/qa/issue/onnx-to-torch/onnx.if failed. Not sure if it is the right way to edit all the models. We ask tomorrow.
— Reply to this email directly, view it on GitHub https://github.com/nod-ai/SHARK-ModelDev/issues/696#issuecomment-2430725006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALODRYIK6DRMAE3SE26QWRDZ44GFJAVCNFSM6AAAAABIAVWOZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQG4ZDKMBQGY . You are receiving this because you were mentioned.Message ID: @.***>
Yeah, we could try adding the models to the basic opt list in the test suite
azure_models.py
With adding to basic_opt list, 2 models "coat_mini", "coat_tiny"
, pass the comiple and then get FAILED (Numerics)
.
Others got this kind of bug:
python ./run.py --mode=cl-onnx-iree -v -t model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla --torchtolinalg
Traceback (most recent call last):
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 481, in <module>
main(parser.parse_args())
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 114, in main
test_list = get_tests(args.groups, args.test_filter, args.testsfile)
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/./run.py", line 70, in get_tests
from onnx_tests.models import model
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/onnx_tests/models/model.py", line 12, in <module>
from .nlp import *
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/onnx_tests/models/nlp.py", line 107, in <module>
register_test(dim_param_constructor(default_nlp_params), model_name)
File "/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/e2e_testing/registry.py", line 17, in register_test
raise ValueError(
ValueError: Duplicate test name: 'model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla'. Please make sure that the function wrapped by `register_test` has a unique name.
Ah, I see. For nlp models, we need to register those differently.
I tried doing basic optimizations for those nlp models locally, but it still has the problematic if statement, so it's likely unable to fold it when dynamic dims are present.
Add a PR to do those two static dim models as basic_opt tests for now.
Add a PR to do those two static dim models as basic_opt tests for now.
4 related model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla, similar failed pattern:
!torch.vtensor<[?,?],f32> / !torch.vtensor<[?,?,?],f32> -> !torch.vtensor<[],f32>
%1900 = torch.operator "onnx.If"(%1899) : (!torch.vtensor<[1],i1>) -> !torch.vtensor<[],f32> {
%1946 = torch.operator "onnx.Identity"(%1896) : (!torch.vtensor<[?,?,?],f32>) -> !torch.vtensor<[?,?,?],f32>
torch.operator_terminator %1946 : !torch.vtensor<[?,?,?],f32>
}, {
%1946 = torch.operator "onnx.Constant"() {torch.onnx.value = dense<1> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%1947 = torch.operator "onnx.Squeeze"(%1896, %1946) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[?,?],f32>
torch.operator_terminator %1947 : !torch.vtensor<[?,?],f32>
}
->
/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run/model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla/model.torch_onnx.mlir:1904:13: error: 'torch.prim.If' op along control flow edge from Region #0 to parent results: source type #0 '!torch.vtensor<[?,?],f32>' should match input type #0 '!torch.vtensor<[],f32>'
%1900 = torch.operator "onnx.If"(%1899) : (!torch.vtensor<[1],i1>) -> !torch.vtensor<[],f32> {
^
/proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run/model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla/model.torch_onnx.mlir:1904:13: note: see current operation:
%7620 = "torch.prim.If"(%7619) ({
%7737 = "torch.vtensor.literal"() <{value = dense<1> : tensor<1xsi64>}> : () -> !torch.vtensor<[1],si64>
%7738 = "torch.constant.int"() <{value = 0 : i64}> : () -> !torch.int
%7739 = "torch.constant.int"() <{value = 0 : i64}> : () -> !torch.int
%7740 = "torch.aten.select.int"(%7737, %7738, %7739) : (!torch.vtensor<[1],si64>, !torch.int, !torch.int) -> !torch.vtensor<[1],si64>
%7741 = "torch.aten.item"(%7740) : (!torch.vtensor<[1],si64>) -> !torch.int
%7742 = "torch.prim.ListConstruct"(%7741) : (!torch.int) -> !torch.list<int>
%7743 = "torch.prims.squeeze"(%7604, %7742) : (!torch.vtensor<[?,?,?],f32>, !torch.list<int>) -> !torch.vtensor<[?,?],f32>
"torch.prim.If.yield"(%7743) : (!torch.vtensor<[?,?],f32>) -> ()
}, {
%7735 = "torch.constant.none"() : () -> !torch.none
%7736 = "torch.aten.clone"(%7604, %7735) : (!torch.vtensor<[?,?,?],f32>, !torch.none) -> !torch.vtensor<[?,?,?],f32>
"torch.prim.If.yield"(%7736) : (!torch.vtensor<[?,?,?],f32>) -> ()
}) : (!torch.bool) -> !torch.vtensor<[],f32>
@jinchen62 you can take this 2. retinanet_resnet50_fpn_vaiq_int8 and KeypointRCNN_vaiq_int8 failed pattern are same, one branch dynamic/one branch static, return dynamic [?]/[0] -> [?]
%6272 = torch.operator "onnx.If"(%6271) : (!torch.vtensor<[],i1>) -> !torch.vtensor<[?],si64> {
%6346 = torch.operator "onnx.ReduceMax"(%6264) {torch.onnx.keepdims = 0 : si64} : (!torch.vtensor<[?,?],f32>) -> !torch.vtensor<[],f32>
%6347 = torch.operator "onnx.Cast"(%6266) {torch.onnx.to = 1 : si64} : (!torch.vtensor<[?],si64>) -> !torch.vtensor<[?],f32>
%6348 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3571> : tensor<f32>} : () -> !torch.vtensor<[],f32>
%6349 = torch.operator "onnx.Add"(%6346, %6348) : (!torch.vtensor<[],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[],f32>
%6350 = torch.operator "onnx.Mul"(%6347, %6349) : (!torch.vtensor<[?],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[?],f32>
%6351 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3572> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6352 = torch.operator "onnx.Unsqueeze"(%6350, %6351) : (!torch.vtensor<[?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[?,1],f32>
%6353 = torch.operator "onnx.Add"(%6264, %6352) : (!torch.vtensor<[?,?],f32>, !torch.vtensor<[?,1],f32>) -> !torch.vtensor<[?,?],f32>
%6354 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3573> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6355 = torch.operator "onnx.Unsqueeze"(%6353, %6354) : (!torch.vtensor<[?,?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,?,?],f32>
%6356 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3574> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6357 = torch.operator "onnx.Unsqueeze"(%6265, %6356) : (!torch.vtensor<[?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,?],f32>
%6358 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3575> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6359 = torch.operator "onnx.Unsqueeze"(%6357, %6358) : (!torch.vtensor<[1,?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,1,?],f32>
%6360 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3576> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6361 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3577> : tensor<1xf32>} : () -> !torch.vtensor<[1],f32>
%6362 = torch.operator "onnx.NonMaxSuppression"(%6355, %6359, %6360, %6361) : (!torch.vtensor<[1,?,?],f32>, !torch.vtensor<[1,1,?],f32>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],f32>) -> !torch.vtensor<[?,3],si64>
%6363 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3578> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6364 = torch.operator "onnx.Gather"(%6362, %6363) {torch.onnx.axis = 1 : si64} : (!torch.vtensor<[?,3],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[?,1],si64>
%6365 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3579> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64>
%6366 = torch.operator "onnx.Squeeze"(%6364, %6365) : (!torch.vtensor<[?,1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[?],si64>
torch.operator_terminator %6366 : !torch.vtensor<[?],si64>
}, {
%6346 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__3580> : tensor<0xsi64>} : () -> !torch.vtensor<[0],si64>
torch.operator_terminator %6346 : !torch.vtensor<[0],si64>
}
4 related model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla, similar failed pattern:
!torch.vtensor<[?,?],f32> / !torch.vtensor<[?,?,?],f32> -> !torch.vtensor<[],f32>
With onnx-modifier.exe, This can be fixed by remove if node and add the squeeze node(in if then branch) pass its output to the next add node as input directly. Here is the test output.
python ./run.py --mode=cl-onnx-iree -v -t model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla --torchtolinalg
Stages to be run: ['setup', 'import_model', 'preprocessing', 'compilation', 'construct_inputs', 'native_inference', 'compiled_inference', 'postprocessing']
Test list: ['model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla']
running test model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla...
Unzipping - /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/tmp/model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla/model.onnx.zip...
Unzipping succeded. Look for extracted contents in /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run/model--splinter-large-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla
2024-10-29 17:46:40.456511164 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_4_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 17:46:40.456554925 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_2_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 17:46:40.456562795 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_3_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 17:46:40.456585936 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_1_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:22.548917921 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_4_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:22.548954032 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_2_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:22.548960962 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_3_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:22.548981482 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_1_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:24.037809661 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_4_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:24.037850531 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_2_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:24.037858882 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_3_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:24.037881712 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_1_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:25.649182399 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_4_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:25.649218780 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_2_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:25.649228320 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_3_output_0'. It is not used by any node and should be removed from the model.
2024-10-29 18:02:25.649250351 [W:onnxruntime:, graph.cc:4093 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_1_output_0'. It is not used by any node and should be removed from the model.
FAILED (Numerics)
Test Summary:
PASSES: 0
TOTAL: 1
results stored in /proj/gdba/shark/chi/src/SHARK-TestSuite/alt_e2eshark/test-run
I was working on #566
Found the problem in KeypointRCNN (gist with stripped IR at %5503 (the above link takes you to the correct line.
When lowering an
If
op with two branches returning different types, we encounter:Reproducer: