ORT returns incorrect result for UINT8 Matmul on specific CPU

Describe the issue

CPU Onnxruntime returns incorrect result for UINT8 quantized model (contains just 1 matmul shape(1,4) @ shape(4,1)) with the following env: onnx==1.14 onnxruntime==1.16 protobuf==4.24.4

Passing on CPU: AMD Ryzen 9 7900X 12-Core Processor ; correct output is 0.22868575 Failing on CPU: AMD Ryzen Threadripper 2950X 16-Core Processor; incorrect output is -0.44277453

To reproduce

Onnx file: mm_no_bias_uint8.tar.gz

Script to repro:

import onnx
import onnxruntime as ort
import numpy as np

import argparse

# get input path from commandline arguments
parser = argparse.ArgumentParser()
parser.add_argument('--input_model_path', type=str, help='path to the input model')
args = parser.parse_args()

# first import the onnx model
input_model_path = args.input_model_path
ort_session = ort.InferenceSession(input_model_path, providers=["CPUExecutionProvider"])
input_name = ort_session.get_inputs()[0].name
input_shape = ort_session.get_inputs()[0].shape
input_data = np.array([[0.6541, 0.4707, 0.2821, 0.5569]], dtype=np.float32)
quantized_output = ort_session.run(None, {input_name: input_data})
print("output: ", quantized_output)

Urgency

Customer release is blocked by this issue.

Platform

Linux

OS Version

20.04.6 LTS (Focal Fossa)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

microsoft / onnxruntime