Closed HYLcool closed 1 year ago
I wonder if this issue would explain the error message: https://github.com/NVIDIA/TensorRT/issues/2165. Then the activation type should be different for tensorrt.
I wonder if this issue would explain the error message: NVIDIA/TensorRT#2165. Then the activation type should be different for tensorrt.
Thanks for your reply! To be honest, I don't really understand the issue you mentioned, and I don't find any operator named Scale from onnx operator list.😢
I found that the inputs of DQ operator for bias are:
And the inputs of DQ operators for weight are:
They are all acceptable for the DQ operator as below.
So I think maybe this is not about the dypte of inputs as the issue you mentioned.
Before I added extra_options={'AddQDQPairToWeight': True}
to quantize_static, the error message was raised at the DQ operator before weights like this issue #11535. Therefore I think maybe after I add QDQ pairs to bias this problem could be solved.
Hi, I tried to write out the calibration table, then I ran trtexec
with arg --calib
and it loaded quantized engine successfully. So it looks like an optional solution to run quantized onnx model on TensorRT~
But I still wonder if onnxruntime could generate quantized onnx model that can be consumed directly by TensorRT🤔
It seems that PR #14549 can solve this problem by removing DQ nodes of bias (DO NOT quantize the bias by setting QuantizeBias
to False). Thanks for you guys~
Describe the issue
Hi, I tried to use QDQ Format to quantize my onnx model and use trtexec to benchmark its inference speed. And I met a problem similar to #11535. After I add
extra_options={'AddQDQPairToWeight': True}
toquantize_static
, the quantized model still fails to run on TRT and returns errors like this:I found that there are QDQ pairs after FP32 weights but there is still only a DQ op after quantized bias (see the figure below). That may be the reason why this error occurs.
So I wonder if the quantization in onnxruntime supports adding QDQ pairs to bias like AddQDQPairToWeight ?
To reproduce
The onnx model mentioned above is a mobilenet-v2 model obtained from the onnx model zoo link.
To reproduce: (similar to the example from here)
quantize_static( 'mobilenetv2-opset10-infer.onnx', 'mobilenetv2-opset10-quantized.onnx', data_reader, # from the run.py code quant_format=QuantFormat.QDQ, per_channel=False, weight_type=QuantType.QInt8, activation_type=QuantType.QInt8, optimize_model=False, extra_options={'AddQDQPairToWeight': True})
trtexec --onnx=mobilenetv2-opset10-quantized.onnx --avgRuns=1000 --workspace=1024 --verbose --int8