linalg.generic' op inferred input/output operand #1 has shape's dimension #0 to be 12288, but found 512

pdhirajkumarprasad commented 1 month ago

For the given IR, I am getting following error

model.torch_onnx.mlir:21:13: error: 'linalg.generic' op inferred input/output operand #1 has shape's dimension #0 to be 12288, but found 512
    %1280 = torch.operator "onnx.Mul"(%1279, %arg6) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[512],f32>) -> !torch.vtensor<[?,?,512],f32> 
            ^
model.torch_onnx.mlir:21:13: note: see current operation: 
%136 = "linalg.generic"(%134, %10, %135) <{indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 2, 1>}> ({
^bb0(%arg10: f32, %arg11: f32, %arg12: f32):
  %144 = "arith.mulf"(%arg10, %arg11) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  "linalg.yield"(%144) : (f32) -> ()
}) : (tensor<1x24x12288xf32>, tensor<512xf32>, tensor<1x24x12288xf32>) -> tensor<1x24x12288xf32>

module {
  func.func @torch_jit(%arg0: !torch.vtensor<[1,3,384,384],f32>, %arg1: !torch.vtensor<[?,?,1],f32>, %arg2: !torch.vtensor<[?,?,?,?,?,?],f32> , %arg3: !torch.vtensor<[4],si64>, %arg4: !torch.vtensor<[?,?,512],f32>, %arg5: !torch.vtensor<[4],si64>,%arg6:!torch.vtensor<[512],f32> ) -> !torch.vtensor<[?,?,512],f32>  attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %267 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_onnx__Reshape_6627> : tensor<4xsi64>} : () -> !torch.vtensor<[4],si64> 
    %572 = torch.operator "onnx.Identity"(%267) : (!torch.vtensor<[4],si64>) -> !torch.vtensor<[4],si64> 
    %1222 = torch.operator "onnx.Reshape"(%arg4, %572) : (!torch.vtensor<[?,?,512],f32>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[?,?,?,?],f32> 
    %1266 = torch.operator "onnx.Reshape"(%arg2, %arg3) : (!torch.vtensor<[?,?,?,?,?,?],f32>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[?,?,?,?],f32> 
    %1267 = torch.operator "onnx.Add"(%1222, %1266) : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> 
    %264 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_onnx__Reshape_6620> : tensor<3xsi64>} : () -> !torch.vtensor<[3],si64> 
    %563 = torch.operator "onnx.Identity"(%264) : (!torch.vtensor<[3],si64>) -> !torch.vtensor<[3],si64> 
    %1268 = torch.operator "onnx.Reshape"(%1267, %563) : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[3],si64>) -> !torch.vtensor<[?,?,?],f32> 

    %1271 = torch.operator "onnx.Sub"(%1268, %arg1) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[?,?,1],f32>) -> !torch.vtensor<[?,?,?],f32> 
    %1272 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__165> : tensor<f32>} : () -> !torch.vtensor<[],f32> 
    %1273 = torch.operator "onnx.Pow"(%1271, %1272) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[?,?,?],f32> 
    %1274 = torch.operator "onnx.Constant"() {torch.onnx.value = dense<-1> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %1275 = torch.operator "onnx.ReduceMean"(%1273, %1274) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[?,?,1],f32> 
    %1276 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__166> : tensor<f32>} : () -> !torch.vtensor<[],f32> 
    %1277 = torch.operator "onnx.Add"(%1275, %1276) : (!torch.vtensor<[?,?,1],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[?,?,1],f32> 
    %1278 = torch.operator "onnx.Sqrt"(%1277) : (!torch.vtensor<[?,?,1],f32>) -> !torch.vtensor<[?,?,1],f32> 
    %1279 = torch.operator "onnx.Div"(%1271, %1278) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[?,?,1],f32>) -> !torch.vtensor<[?,?,?],f32> 
    %1280 = torch.operator "onnx.Mul"(%1279, %arg6) : (!torch.vtensor<[?,?,?],f32>, !torch.vtensor<[512],f32>) -> !torch.vtensor<[?,?,512],f32> 
    %1281 = torch.operator "onnx.Add"(%1280, %arg6) : (!torch.vtensor<[?,?,512],f32>, !torch.vtensor<[512],f32>) -> !torch.vtensor<[?,?,512],f32> 
    return %1281: !torch.vtensor<[?,?,512],f32>
  }
}

{-#
  dialect_resources: {
    builtin: {
      _onnx__Reshape_6627: "0x080000000100000000000000180000000000000018000000000000000002000000000000",
      _onnx__Reshape_6620: "0x080000000100000000000000FFFFFFFFFFFFFFFF0002000000000000",
      __165: "0x0800000000000040",
      __166: "0x08000000ACC52737"
    }
  }
#-}

command:

iree-compile --iree-hal-target-backends=llvm-cpu model.torch_onnx.mlir

zjgarvey commented 1 month ago

Hmm. This seems to be resolved when applying shape inference passes before compilation, but I'm not totally sure why. Will look into this further. Maybe we just add shape inference torch-mlir passes into torch-to-iree-pipeline (or whatever path is being used to get onnx to torch in iree-compile).

zjgarvey commented 1 month ago

I've been looking into this today. Specifically, I've been looking at the model "maxvit_rmlp_base_rw_224.sw_in12k".

Using the shape inference passes in torch-mlir indeed resolves this issue, but the numerics for the affected models is extremely bad. I've identified a single Conv op that seems to provide the numerical mismatch:

module {
  func.func @main(%arg0: !torch.vtensor<[1,256,112,112],f32>, %arg1: !torch.vtensor<[256,1,3,3],f32>, %arg2: !torch.vtensor<[256],f32>) -> !torch.vtensor<[1,256,56,56],f32> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.Conv"(%arg0, %arg1, %arg2) {torch.onnx.group = 256 : si64, torch.onnx.kernel_shape = [3 : si64, 3 : si64], torch.onnx.pads = [1 : si64, 1 : si64, 1 : si64, 1 : si64], torch.onnx.strides = [2 : si64, 2 : si64]} : (!torch.vtensor<[1,256,112,112],f32>, !torch.vtensor<[256,1,3,3],f32>, !torch.vtensor<[256],f32>) -> !torch.vtensor<[1,256,56,56],f32> 
    return %0 : !torch.vtensor<[1,256,56,56],f32>
  }
}

This mlir was generated from the test being added in https://github.com/nod-ai/SHARK-TestSuite/pull/351.

To reproduce in the test suite, setup according to the instructions in alt_e2eshark/README.md, then run

python run.py -t conv_depthwise_stride_2 -v

Will post another issue for this soon.

@pdhirajkumarprasad

zjgarvey commented 1 month ago

https://github.com/iree-org/iree/issues/18600

Issue filed for conv numerics.

zjgarvey commented 1 month ago

I think I've finally got a solution for our never-ending shape issues by using some onnxruntime tools to optimize the model before importing. Testing it out a bit more, will keep updated.

zjgarvey commented 1 month ago

Verified that the solution works on swinv2_small_window8_256.ms_in1k.

Checking the expand shape list now too

pdhirajkumarprasad commented 1 week ago

We have many model which are failing at compiled inference without any error message and is due to shape. We are generating IR like '%16 = torch.operator "onnx.Shape"(%arg0) : (!torch.vtensor<[?,?],si64>) -> !torch.vtensor<[2],si64>'

Here is the list of all model: ++++ ecaresnet269d model--125M_GPTneo_reward_base--Myashka model--BERT_summary--Shobhank-iiitdwd model--Bartlarge--Shubham09 model--GPyT--Sentdex model--Jasmine-350M--UBC-NLP model--Microllama_Char_200k_step--Corianas model--PT_GPTNEO125_ATG--xhyi model--TinyMistral-248M-v2-cleaner--M4-ai model--TinyStories-1Layer-21M--roneneldan model--TinyStories-1M--roneneldan model--TinyStories-2Layers-33M--roneneldan model--TinyStories-33M--roneneldan model--TinyStories-3M--roneneldan model--TinyStories-8M--roneneldan model--Translation--shed-e model--bart-CaPE-xsum--praf-choub model--bart-base-booksum--KamilAin model--bart-base-cnn--ainize model--bart-base-few-shot-k-1024-finetuned-squad-seed-2--anas-awadalla model--bart-base-few-shot-k-1024-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-128-finetuned-squad-seed-2--anas-awadalla model--bart-base-few-shot-k-128-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-16-finetuned-squad-seed-0--anas-awadalla model--bart-base-few-shot-k-16-finetuned-squad-seed-2--anas-awadalla model--bart-base-few-shot-k-16-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-256-finetuned-squad-seed-0--anas-awadalla model--bart-base-few-shot-k-256-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-32-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-512-finetuned-squad-seed-0--anas-awadalla model--bart-base-few-shot-k-512-finetuned-squad-seed-4--anas-awadalla model--bart-base-few-shot-k-64-finetuned-squad-seed-0--anas-awadalla model--bart-base-few-shot-k-64-finetuned-squad-seed-2--anas-awadalla model--bart-base-finetuned-squad--huxxx657 model--bart-base-samsum--philschmid model--bart-base-squad2--sjrhuschlee model--bart-base-xsum--harouzie model--bart-base-xsum--morenolq model--bart-german--Shahm model--bart-large-cnn--facebook model--bart-large-cnn-samsum--philschmid model--bart-large-finetuned-squadv1--valhalla model--bart-large-xsum--facebook model--bart-mofe-rl-xsum--praf-choub model--bart_lfqa_sqaud--Shubham09 model--cm_code_clippy--ncoop57 model--distilbart-cnn-12-3--sshleifer model--distilbart-cnn-12-6-samsum--philschmid model--distilbart-cnn-6-6--sshleifer model--distilbart-xsum-1-1--sshleifer model--distilbart-xsum-12-1--sshleifer model--distilbart-xsum-12-3--sshleifer model--distilbart-xsum-12-6--sshleifer model--distilbart-xsum-6-6--sshleifer model--distilbart-xsum-9-6--sshleifer model--distilgpt2-sd--aabidk model--distilgpt2-stable-diffusion--FredZhang7 model--distilgpt2-stable-diffusion-v2--FredZhang7 model--distilgpt2-wikitext2--Intel model--finetuned-opt-squad-dataset--choohan model--finetuned-opt-squad-dataset-2--choohan model--finetuned-opt-squad-dataset-3--choohan model--finetuned_distilgpt2_sst2_negation0.0001_pretrainedTrue_epochs1--jhaochenz model--finetuned_distilgpt2_sst2_negation0.001_pretrainedTrue_epochs1--jhaochenz model--finetuned_distilgpt2_sst2_negation0.001_pretrainedTrue_epochs3--jhaochenz model--finetuned_distilgpt2_sst2_negation0.01_pretrainedFalse_epochs10--jhaochenz model--finetuned_distilgpt2_sst2_negation0.01_pretrainedFalse_epochs3--jhaochenz model--finetuned_distilgpt2_sst2_negation0.01_pretrainedFalse_epochs6--jhaochenz model--finetuned_distilgpt2_sst2_negation0.01_pretrainedTrue_epochs1--jhaochenz model--finetuned_distilgpt2_sst2_negation0.0_pretrainedFalse_epochs0--jhaochenz model--finetuned_distilgpt2_sst2_negation0.0_pretrainedTrue--jhaochenz model--finetuned_distilgpt2_sst2_negation0.0_pretrainedTrue_epochs0--jhaochenz model--finetuned_distilgpt2_sst2_negation0.1_pretrainedFalse_epochs10--jhaochenz model--finetuned_distilgpt2_sst2_negation0.1_pretrainedTrue_epochs1--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.0001_pretrainedTrue--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.0001_pretrainedTrue_epochs1--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.0001_pretrainedTrue_epochs3--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.0005_pretrainedTrue--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.001_pretrainedTrue_epochs1--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.001_pretrainedTrue_epochs3--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.01--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.01_pretrainedFalse_epochs10--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.01_pretrainedFalse_epochs3--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.01_pretrainedTrue_epochs1--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.05--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.05_pretrainedFalse--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.0_pretrainedFalse--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.0_pretrainedFalse_epochs30--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.1_pretrainedFalse_epochs10--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.1_pretrainedTrue_epochs1--jhaochenz model--finetuned_gpt2-medium_sst2_negation0.2--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.2_pretrainedFalse--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.5--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.5_pretrainedFalse--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.8--yuhuizhang model--finetuned_gpt2-medium_sst2_negation0.8_pretrainedFalse--yuhuizhang model--finetuned_gpt2_sst2_negation0.0001_pretrainedFalse_epochs1--yuhuizhang model--finetuned_gpt2_sst2_negation0.0001_pretrainedTrue--yuhuizhang model--finetuned_gpt2_sst2_negation0.0005_pretrainedTrue--yuhuizhang model--finetuned_gpt2_sst2_negation0.001_pretrainedTrue--yuhuizhang model--finetuned_gpt2_sst2_negation0.05--yuhuizhang model--finetuned_gpt2_sst2_negation0.0_pretrainedFalse--yuhuizhang model--finetuned_gpt2_sst2_negation0.2_pretrainedFalse--yuhuizhang model--finetuned_gpt2_sst2_negation0.5--yuhuizhang model--finetuned_gpt2_sst2_negation0.8--yuhuizhang model--finetuned_gpt2_sst2_negation0.8_pretrainedFalse--yuhuizhang model--finetuning-sentiment-model-3000-samples--DravenTay model--gemma-tiny-random--yujiepan model--gpt2--openai-community model--gpt2-650k-stable-diffusion-prompt-generator--Ar4ikov model--gpt2-alpaca-gpt4--vicgalle model--gpt2-finetuning-sentiment-model-3000-samples--LYTinn model--gpt2-imdb-sentiment-classifier--mnoukhov model--gpt2-wikitext103--himanshubeniwal model--gpt2_wikitext37_7k_pretrained_iphone_1e4--himanshubeniwal model--hebrew_poetry-gpt_neo-small--Norod78 model--llama-160m--JackFram model--llama-wikitext--manu model--lsg-bart-base-4096-booksum--ccdv model--marian-finetuned-kde4-cs2sv--ksaml model--marian-finetuned-kde4-en-fr--RajkNakka model--marian-finetuned-kde4-en-to-ar--anibahug model--marian-finetuned-kde4-en-to-es--tmobaggins model--marian-finetuned-kde4-en-to-es--zainnaved model--marian-finetuned-kde4-en-to-fr--Abelll model--marian-finetuned-kde4-en-to-fr--Alesteba model--marian-finetuned-kde4-en-to-fr--DarioLopes model--marian-finetuned-kde4-en-to-fr--Dewa model--marian-finetuned-kde4-en-to-fr--Eitanli model--marian-finetuned-kde4-en-to-fr--Fredvv model--marian-finetuned-kde4-en-to-fr--Leisa model--marian-finetuned-kde4-en-to-fr--Mikey8943 model--marian-finetuned-kde4-en-to-fr--Molka11 model--marian-finetuned-kde4-en-to-fr--Najeen model--marian-finetuned-kde4-en-to-fr--Neulvo model--marian-finetuned-kde4-en-to-fr--Student3342 model--marian-finetuned-kde4-en-to-fr--Thinkcru model--marian-finetuned-kde4-en-to-fr--Yuch model--marian-finetuned-kde4-en-to-fr--aaraki model--marian-finetuned-kde4-en-to-fr--amartyobanerjee model--marian-finetuned-kde4-en-to-fr--aneeshmb02 model--marian-finetuned-kde4-en-to-fr--anjankumar model--marian-finetuned-kde4-en-to-fr--apatidar0 model--marian-finetuned-kde4-en-to-fr--bishalbaaniya model--marian-finetuned-kde4-en-to-fr--chandrasutrisnotjhong model--marian-finetuned-kde4-en-to-fr--clboetticher model--marian-finetuned-kde4-en-to-fr--coreyabs-db model--marian-finetuned-kde4-en-to-fr--evincent18 model--marian-finetuned-kde4-en-to-fr--feeeper model--marian-finetuned-kde4-en-to-fr--jatinshah model--marian-finetuned-kde4-en-to-fr--jfarmerphd model--marian-finetuned-kde4-en-to-fr--kbalde model--marian-finetuned-kde4-en-to-fr--kosec39 model--marian-finetuned-kde4-en-to-fr--lewtun model--marian-finetuned-kde4-en-to-fr--libalabala model--marian-finetuned-kde4-en-to-fr--lsimon model--marian-finetuned-kde4-en-to-fr--luhui model--marian-finetuned-kde4-en-to-fr--mbateman model--marian-finetuned-kde4-en-to-fr--miesnerjacob model--marian-finetuned-kde4-en-to-fr--mrp model--marian-finetuned-kde4-en-to-fr--mxalmeida model--marian-finetuned-kde4-en-to-fr--ncduy model--marian-finetuned-kde4-en-to-fr--ornil1 model--marian-finetuned-kde4-en-to-fr--raisin2402 model--marian-finetuned-kde4-en-to-fr--rootacess model--marian-finetuned-kde4-en-to-fr--sgugger model--marian-finetuned-kde4-en-to-fr--sofa566 model--marian-finetuned-kde4-en-to-fr--sungchun71 model--marian-finetuned-kde4-en-to-fr--thucdangvan020999 model--marian-finetuned-kde4-en-to-fr--tmatup model--marian-finetuned-kde4-en-to-fr--ttri-pruiu model--marian-finetuned-kde4-en-to-fr--vsrinivas model--marian-finetuned-kde4-en-to-fr-2--Siqi model--marian-finetuned-kde4-en-to-fr3--Ghost1 model--marian-finetuned-kde4-en-to-hi--vsrinivas model--marian-finetuned-kde4-en-to-ja--Hoax0930 model--marian-finetuned-kde4-en-to-vi--VanHoan model--marian-finetuned-kde4-en-to-vi--huanvo88 model--marian-finetuned-kde4-en-to-vi--huynguyen208 model--marian-finetuned-kde4-en-to-zh--DrY model--marian-finetuned-kde4-en-to-zh--chenyanjin model--marian-finetuned-kde4-en-to-zh--luoyixin model--marian-finetuned-kde4-en-to-zh_TW--peterhsu model--marian-finetuned-kde4-en-to-zh_TW-accelerate--peterhsu model--megatron-gpt2-345m--robowaifudev model--neo-125m-wills-loss-function-by-tr--Jellywibble model--opt-350m--facebook model--opt-350m-wikitext2--lnair model--opt-finetuned-squad-dataset--choohan model--ov-gpt2-fp32-kv-cache--vuiseng9 model--ov-opt-350m-8bit-85pc-sparse-kv-cache--vuiseng9 model--ov-opt-350m-8bit-kv-cache--vuiseng9 model--ov-opt-350m-fp32-kv-cache--vuiseng9 model--pythia-410mn-ntoxic--skrishna model--pythia-70-m-finetuned--selinerdem model--pythia-70m-toxicity-model--skrishna model--really-tiny-falcon-testing--fxmarty model--roberta_shared_bbc_xsum--patrickvonplaten model--s2t-medium-librispeech-asr--facebook model--smol_llama-220M-GQA--BEE-spoke-data model--smol_llama-81M-tied--BEE-spoke-data model--tiny-gpt2--taufeeque model--tiny-gpt2-magicprompt--pszemraj model--tiny-random-FalconForCausalLM--illuin model--tiny-random-GPTNeoForSequenceClassification--hf-tiny-model-private model--tiny-random-llama--IlyasMoutawwakil model--tiny-testing-falcon-alibi--fxmarty model--unisumm_3--vishw2703 model--wikitext-ds--mahmoudNG migraphx_ORTdistilgpt2_1 migraphx_ORTonnx_modelsdistilgpt2_1_fp16_gpu migraphx_modelswhisper-tiny-encoder ++++

zjgarvey commented 1 week ago

Yeah, this will require further shape scalarization. I'm going to try and land some changes to the pass this week, then we can identify and build patterns for unblocking these more reliably.

nod-ai / SHARK-ModelDev

linalg.generic' op inferred input/output operand #1 has shape's dimension #0 to be 12288, but found 512 #825