Open saurabhtangri opened 4 months ago
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
I also encounter this error. Is QLinearMatMul not handled at all by ORT or does it work only on a specific type of input/output ?
Describe the issue
model execution failing with error
NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for QLinearMatMul(21) node with name ''
To reproduce
import numpy as np import onnx from onnx import helper, TensorProto, numpy_helper import onnxruntime as ort
Define the input dimensions and types
input_dim = [2, 2] input_type = TensorProto.INT8
Create the inputs
input_A = helper.make_tensor_value_info('input_A', input_type, input_dim) input_B = helper.make_tensor_value_info('input_B', input_type, input_dim) input_A_scale = helper.make_tensor_value_info('input_A_scale', TensorProto.FLOAT, []) input_A_zero_point = helper.make_tensor_value_info('input_A_zero_point', input_type, []) input_B_scale = helper.make_tensor_value_info('input_B_scale', TensorProto.FLOAT, []) input_B_zero_point = helper.make_tensor_value_info('input_B_zero_point', input_type, []) output_scale = helper.make_tensor_value_info('output_scale', TensorProto.FLOAT, []) output_zero_point = helper.make_tensor_value_info('output_zero_point', input_type, [])
Create the output
output = helper.make_tensor_value_info('output', input_type, input_dim)
Create the QLinearMatMul node
qlinearmatmul_node = helper.make_node( 'QLinearMatMul', inputs=[ 'input_A', 'input_A_scale', 'input_A_zero_point', 'input_B', 'input_B_scale', 'input_B_zero_point', 'output_scale', 'output_zero_point' ], outputs=['output'] )
Create the graph
graph_def = helper.make_graph( [qlinearmatmul_node], 'qlinearmatmul-graph', [input_A, input_A_scale, input_A_zero_point, input_B, input_B_scale, input_B_zero_point, output_scale, output_zero_point], [output] )
Create the model
model_def = helper.make_model(graph_def, producer_name='qlinearmatmul-model') model_def.ir_version = 5 onnx.checker.check_model(model_def) onnx.save(model_def, 'qlinearmatmul_model.onnx')
Test the model using ONNX Runtime
Prepare the input data
input_A_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8) input_B_data = np.random.randint(-128, 127, size=(16, 16)).astype(np.int8) input_A_scale_data = np.array(0.1, dtype=np.float32) input_A_zero_point_data = np.array(0, dtype=np.int8) input_B_scale_data = np.array(0.1, dtype=np.float32) input_B_zero_point_data = np.array(0, dtype=np.int8) output_scale_data = np.array(0.2, dtype=np.float32) output_zero_point_data = np.array(0, dtype=np.int8)
Prepare the inputs for ONNX Runtime
inputs = { 'input_A': input_A_data, 'input_A_scale': input_A_scale_data, 'input_A_zero_point': input_A_zero_point_data, 'input_B': input_B_data, 'input_B_scale': input_B_scale_data, 'input_B_zero_point': input_B_zero_point_data, 'output_scale': output_scale_data, 'output_zero_point': output_zero_point_data, }
Enable verbose logging for ONNX Runtime session
sess_options = ort.SessionOptions()
sess_options.log_severity_level = 0 # 0 is the most verbose logging level
Run the model on the input data
ort_session = ort.InferenceSession('qlinearmatmul_model.onnx', sess_options)
ort_outputs = ort_session.run(None, inputs)
Print the output
print("Output:", ort_outputs[0])
Urgency
not urgent and not sure if the script is doing something wrong.
Platform
Windows
OS Version
Windows 11+WSL
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response