microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.79k stars 2.94k forks source link

MLAS failing with "Could not find an implementation for QLinearMatMul" #21531

Open saurabhtangri opened 4 months ago

saurabhtangri commented 4 months ago

Describe the issue

model execution failing with error

NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for QLinearMatMul(21) node with name ''

To reproduce

import numpy as np import onnx from onnx import helper, TensorProto, numpy_helper import onnxruntime as ort

Define the input dimensions and types

input_dim = [2, 2] input_type = TensorProto.INT8

Create the inputs

input_A = helper.make_tensor_value_info('input_A', input_type, input_dim) input_B = helper.make_tensor_value_info('input_B', input_type, input_dim) input_A_scale = helper.make_tensor_value_info('input_A_scale', TensorProto.FLOAT, []) input_A_zero_point = helper.make_tensor_value_info('input_A_zero_point', input_type, []) input_B_scale = helper.make_tensor_value_info('input_B_scale', TensorProto.FLOAT, []) input_B_zero_point = helper.make_tensor_value_info('input_B_zero_point', input_type, []) output_scale = helper.make_tensor_value_info('output_scale', TensorProto.FLOAT, []) output_zero_point = helper.make_tensor_value_info('output_zero_point', input_type, [])

Create the output

output = helper.make_tensor_value_info('output', input_type, input_dim)

Create the QLinearMatMul node

qlinearmatmul_node = helper.make_node( 'QLinearMatMul', inputs=[ 'input_A', 'input_A_scale', 'input_A_zero_point', 'input_B', 'input_B_scale', 'input_B_zero_point', 'output_scale', 'output_zero_point' ], outputs=['output'] )

Create the graph

graph_def = helper.make_graph( [qlinearmatmul_node], 'qlinearmatmul-graph', [input_A, input_A_scale, input_A_zero_point, input_B, input_B_scale, input_B_zero_point, output_scale, output_zero_point], [output] )

Create the model

model_def = helper.make_model(graph_def, producer_name='qlinearmatmul-model') model_def.ir_version = 5 onnx.checker.check_model(model_def) onnx.save(model_def, 'qlinearmatmul_model.onnx')

Test the model using ONNX Runtime

Prepare the input data

input_A_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8) input_B_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8) input_A_scale_data = np.array(0.1, dtype=np.float32) input_A_zero_point_data = np.array(0, dtype=np.int8) input_B_scale_data = np.array(0.1, dtype=np.float32) input_B_zero_point_data = np.array(0, dtype=np.int8) output_scale_data = np.array(0.2, dtype=np.float32) output_zero_point_data = np.array(0, dtype=np.int8)

Prepare the inputs for ONNX Runtime

inputs = { 'input_A': input_A_data, 'input_A_scale': input_A_scale_data, 'input_A_zero_point': input_A_zero_point_data, 'input_B': input_B_data, 'input_B_scale': input_B_scale_data, 'input_B_zero_point': input_B_zero_point_data, 'output_scale': output_scale_data, 'output_zero_point': output_zero_point_data, }

Enable verbose logging for ONNX Runtime session

sess_options = ort.SessionOptions()

sess_options.log_severity_level = 0 # 0 is the most verbose logging level

Run the model on the input data

ort_session = ort.InferenceSession('qlinearmatmul_model.onnx', sess_options)

ort_outputs = ort_session.run(None, inputs)

Print the output

print("Output:", ort_outputs[0])

Urgency

not urgent and not sure if the script is doing something wrong.

Platform

Windows

OS Version

Windows 11+WSL

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Jubengo commented 1 month ago

I also encounter this error. Is QLinearMatMul not handled at all by ORT or does it work only on a specific type of input/output ?