openppl-public / ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Apache License 2.0
1.44k stars 223 forks source link

QLinearMatmul : weight zero point must be a scalar, 1D tensor of size 1, or last to second dimension is 1 #526

Open fhahaha opened 6 months ago

fhahaha commented 6 months ago

2023-12-18 17:40:54.489013950 [E:onnxruntime:, sequential_executor.cc:494 ExecuteKernel] Non-zero status code returned while running QLinearMatMul node. Name:'/model/out_layer/out_layer/OutLinear/MatMul' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/quantization/quantize_linear_matmul.cc:55 virtual onnxruntime::common::Status onnxruntime::QLinearMatMul::Compute(onnxruntime::OpKernelContext*) const IsBQuantParamSupported(b_offset->Shape(), b ? b->Shape() : b_shape_) was false. QLinearMatmul : weight zero point must be a scalar, 1D tensor of size 1, or last to second dimension is 1 Traceback (most recent call last): File "/mnt/user/cuifan/cuifan/workspace/code/model_quantization_all/test/ppq_test.py", line 106, in <module> onnxruntime_outputs.append(sess.run( File "/mnt/user/cuifan/tools/anaconda3/envs/torch2.0/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 201, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running QLinearMatMul node. Name:'/model/out_layer/out_layer/OutLinear/MatMul' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/quantization/quantize_linear_matmul.cc:55 virtual onnxruntime::common::Status onnxruntime::QLinearMatMul::Compute(onnxruntime::OpKernelContext*) const IsBQuantParamSupported(b_offset->Shape(), b ? b->Shape() : b_shape_) was false. QLinearMatmul : weight zero point must be a scalar, 1D tensor of size 1, or last to second dimension is 1

Hi, I was trying to quantize my model to use ppq, and the quantized onnx model can be saved successively. But, when I run the quantized model, I got this error. It seems the zero point of linear layer is not a scalar.

image