Issue with performing shape inference using symbolic_shape_infer.py with Phi-3 ONNX Models

Describe the issue

I am running into this error when I run SymbolicShapeInference.infer_shapes() function on Phi 3 mini onnx model optimized for CPU. From my understanding, looks like infer_shapes() is not able to infer shape for MatMulNBits op (https://github.com/microsoft/onnxruntime/blob/rel-1.18.0/docs/ContribOperators.md#com.microsoft.MatMulNBits)

This issue might not be limited to Phi-3. I suspect this has to do with the operator domain. Does the infer_shapes() automatically work on operators from com.microsoft domain?

I made the input and output dims of the onnx model static before performing shape inference using update_inputs_outputs_dims() from https://github.com/onnx/onnx/blob/main/onnx/tools/update_model_dims.py

Error:

DEBUG:onnxruntime.tools.symbolic_shape_infer:Stopping at incomplete shape inference at MatMulNBits: /model/layers.0/attn/qkv_proj/MatMul_Q4
DEBUG:onnxruntime.tools.symbolic_shape_infer:node inputs:
DEBUG:onnxruntime.tools.symbolic_shape_infer:name: "/model/layers.0/input_layernorm/output_0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3072
      }
    }
  }
}

DEBUG:onnxruntime.tools.symbolic_shape_infer:name: "model.layers.0.attn.qkv_proj.MatMul.weight_Q4"
type {
  tensor_type {
    elem_type: 2
    shape {
      dim {
        dim_value: 9216
      }
      dim {
        dim_value: 96
      }
      dim {
        dim_value: 16
      }
    }
  }
}

DEBUG:onnxruntime.tools.symbolic_shape_infer:name: "model.layers.0.attn.qkv_proj.MatMul.weight_scales"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 884736
      }
    }
  }
}

DEBUG:onnxruntime.tools.symbolic_shape_infer:node outputs:
DEBUG:onnxruntime.tools.symbolic_shape_infer:name: "/model/layers.0/attn/qkv_proj/MatMul/output_0"
type {
}

Traceback (most recent call last):
  File "C:\Users\Administrator\Documents\onnxInsights\scripts\onnxProfile\onnx_profiling.py", line 61, in <module>
    inferred_onnx_model_path = onnx_t.shapeInfer(
  File "C:\Users\Administrator\Documents\onnxInsights\src\onnxInsights\onnxHelpers\onnxTransformer.py", line 245, in shapeInfer
    inferred_model = SymbolicShapeInference.infer_shapes(
  File "C:\Users\Administrator\miniconda3\envs\onnx_test\lib\site-packages\onnxruntime\tools\symbolic_shape_infer.py", line 2912, in infer_shapes
    raise Exception("Incomplete symbolic shape inference")
Exception: Incomplete symbolic shape inference

To reproduce

Download Model:

huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx
huggingface-cli download microsoft/Phi-3-mini-128k-instruct-onnx cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/phi3-mini-128k-instruct-cpu-int4-rtn-block-32-acc-level-4.onnx.data

Packages:

python -> 3.10.14
onnx -> 1.16.1
onnxruntime -> 1.18.0

Code:

from onnx.tools.update_model_dims import update_inputs_outputs_dims
import onnxruntime
from onnxruntime.tools.symbolic_shape_infer import SymbolicShapeInference

dummy_session = onnxruntime.InferenceSession(
    onnx_model_path,
    providers=["CPUExecutionProvider"]
)

model_inputs = dummy_session.get_inputs()
model_outputs = dummy_session.get_outputs()

# static input and output dims
static_input_dims = {
    'input_ids': [1, 1],
    'attention_mask': [1, 2048]
}

for i in range(32*2):
    static_input_dims[model_inputs[i+2].name] = [1, 32, 2047, 96]

static_output_dims = {
    'logits': [1, 1, 32064]
}

for i in range(32*2):
    static_output_dims[model_outputs[i+1].name] = [1, 32, 2048, 96]

# make the input and output dims static in the onnx model
onnx_model = onnx.load(onnx_model_path)

static_dim_model = update_inputs_outputs_dims(onnx_model, static_input_dims, static_output_dims)

# perform shape inference
inferred_model = SymbolicShapeInference.infer_shapes(
            static_dim_model,
            int_max=2**31 - 1,
            auto_merge=False,
            guess_output_rank=False,
            verbose=0
        )

Urgency

No response

Platform

Windows

OS Version

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@kunal-vaishnavi, could you take a look at symbolic shape inference works on phi-3 models.

The uploaded Phi-3 ONNX models already have been symbolic shape inferenced with dynamic axes.

The symbolic shape inference for most quantization operators is defined in each operator's spec.

https://github.com/microsoft/onnxruntime/blob/6baaaf516538f9059da3558b2cd22128a9e42c07/onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L3434-L3481

Here is the list of supported operators whose shapes can be symbolically inferred in the SymbolicShapeInference.infer_shapes tool.

https://github.com/microsoft/onnxruntime/blob/6baaaf516538f9059da3558b2cd22128a9e42c07/onnxruntime/python/tools/symbolic_shape_infer.py#L127-L247

@kunal-vaishnavi, thanks for the response. I have a few questions and comments from my side:

The attached Phi-3 ONNX model is not shape inferred for all the operators. Couple of operators might have symbolic shape inferenced with dynamic axes. The vast majority of the operators are not shape inferenced. For example, here is one of the subgraphs of the model that is not shape inferenced, visualized in Netron:

graph

From my understanding, this is how Netron visualizes shape inferenced operators after running the model through the SymbolicShapeInference.infer_shapes tool, which I was not able to do for the phi-3 model (this subgraph is from a different onnx model):

graph_inf

I do not see MatMulNBits operator in the list of supported operators you shared for the SymbolicShapeInference.infer_shapes tool, which might be a reason why SymbolicShapeInference.infer_shapes tool is giving out the error
Were you able to successfully shape infer the phi-3 model for all operators? I am not able to do it with release version of onnxruntime 1.18.0. Which version of onnxruntime are you using?

The attached Phi-3 ONNX model is not shape inferred for all the operators. Couple of operators might have symbolic shape inferenced with dynamic axes. The vast majority of the operators are not shape inferenced. For example, here is one of the subgraphs of the model that is not shape inferenced, visualized in Netron. From my understanding, this is how Netron visualizes shape inferenced operators after running the model through the SymbolicShapeInference.infer_shapes tool, which I was not able to do for the phi-3 model (this subgraph is from a different onnx model).

You can find the shape inference by clicking on the operator and pressing the '+' icon next to the right of each input name and output name. Here is an example.

I do not see MatMulNBits operator in the list of supported operators you shared for the SymbolicShapeInference.infer_shapes tool, which might be a reason why SymbolicShapeInference.infer_shapes tool is giving out the error

Yes, your error occurs because symbolic shape inference for MatMulNBits isn't implemented in SymbolicShapeInference.infer_shapes. We can add MatMulNBits to fix this.

Were you able to successfully shape infer the phi-3 model for all operators? I am not able to do it with release version of onnxruntime 1.18.0. Which version of onnxruntime are you using?

The uploaded Phi-3 ONNX models are created via ONNX Runtime GenAI's model builder. The shape inferences for their operators are created here in the model builder using onnx.helper.make_tensor_value_info and added to the ModelProto here.

microsoft / onnxruntime