Closed emilianavt closed 3 years ago
Describe the bug Attempting to perform static quantization on a model fails with the following error:
[...] File "/home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/calibrate.py", line 175, in calculate_scale_zeropoint clip_min = next_node.attribute[0].f IndexError: list index (0) out of range
Urgency none
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian Linux
- ONNX Runtime installed from (source or binary): binary installed with pip3
- ONNX Runtime version: 1.5.2
- Python version: 3.7.3
- Visual Studio version (if applicable): not applicable
- GCC/Compiler version (if compiling from source): not applicable
- CUDA/cuDNN version: not applicable
- GPU model and memory: not applicable
To Reproduce I used this code to perform the quantization:
import onnx from onnxruntime.quantization import quantize_dynamic, QuantType from onnxruntime.quantization import quantize_static, calibrate, CalibrationDataReader import cv2 import numpy as np class NFDataReader(CalibrationDataReader): def __init__(self): self.augmented_model_path = "augmented_model.onnx" self.enum_data_dicts = [] self.datasize = 0 mean = np.float32(np.array([0.485, 0.456, 0.406])) std = np.float32(np.array([0.229, 0.224, 0.225])) mean = mean / std std = std * 255.0 for i in range(1): image = cv2.imread(f"{i}.png") image = image[:,:,::-1] * 1 / std - mean image = np.expand_dims(image, 0).astype(np.float32) image = np.transpose(image, (0,3,1,2)) self.enum_data_dicts.append({"input": image}) self.datasize = len(self.enum_data_dicts) self.enum_data_dicts = iter(self.enum_data_dicts) def get_next(self): return next(self.enum_data_dicts, None) model_fp32 = 'lm_model3.onnx' model_quant = 'lm_model3_quant.onnx' # run it dr = NFDataReader() quantized_model = quantize_static(model_input=model_fp32, model_output=model_quant, calibration_data_reader=dr, weight_type=QuantType.QUInt8)
Running the code without the
np.transpose
line and an adjusted input name works for the resnet model provided here so the issue seems to be model dependent.The attached zip file (quant.zip) contains the original ONNX model, the png file as well as the code.
Expected behavior I expected a quantized model to be produced rather than an error to occur.
Screenshots The full stack trace:
Traceback (most recent call last): File "quant_model.py", line 34, in <module> quantized_model = quantize_static(model_input=model_fp32, model_output=model_quant, calibration_data_reader=dr, weight_type=QuantType.QUInt8) File "/home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py", line 171, in quantize_static nodes_to_exclude) File "/home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/calibrate.py", line 260, in calibrate quantization_params_dict = calibrater.calculate_quantization_params(dict_for_quantization) File "/home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/calibrate.py", line 230, in calculate_quantization_params node_params = self.calculate_scale_zeropoint(child, node_thresholds[0], node_thresholds[1]) File "/home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/calibrate.py", line 175, in calculate_scale_zeropoint clip_min = next_node.attribute[0].f IndexError: list index (0) out of range
Additional context
The model was originally converted from pytorch using opset 11. It has a MobileNet V3 backend based on the geffnet implementation, with UNet-like layers on top to do heatmap regression, containing Conv2D and bilinear upscale layers. The definition of the model (it uses
inference=False
) can be found here. Other than static quantization, the ONNX model works fine with onnxruntime.
I also found this issue when quantize landmark model in OpenSeeFace, and I solved it by hard code the source code of home/emiliana/env/lib/python3.7/site-packages/onnxruntime/quantization/calibrate.py to clip min = 0, max = 6 since in your network, all clip is from 0 to 6.
In addition, ONNX quant does not speed up the landmark model at all on my cpu(E5) but makes result worse even after calibration with whole dataset.
I can confirm the bug. It is possible to reproduce it by modifying the example from #5499 with an additional clamp() operation in the model:
import numpy as np
import torch
from onnxruntime.quantization import quantize_dynamic, quantize_static, QuantType
from onnxruntime.quantization.calibrate import CalibrationDataReader
class CalibrationDataProvider(CalibrationDataReader):
def __init__(self):
super(CalibrationDataProvider, self).__init__()
self.counter = 0
def get_next(self):
if self.counter > 2:
return None
else:
self.counter += 1
return {'x': np.random.randn(2, 4).astype(np.float32)}
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.n = int(1)
def forward(self, x):
f = x.shape[0]
y = x.reshape(-1, f)
z = y.clamp(min=-1.0, max=1.0) # this is the problem now
return z
model = Model().float()
dummy_input = (torch.randn(2, 4), )
torch.onnx.export(
model,
dummy_input,
'model.onnx',
input_names=('x',),
export_params=True,
training=False,
opset_version=11)
cdr = CalibrationDataProvider()
quantize_static(model_input='model.onnx',
model_output='model_q.onnx',
calibration_data_reader=cdr)
This simple model shows that the Clip
-node has an empty attributes
list and the min
and max
parameters are in fact additional inputs. Maybe this is relevant for the bug.
ONNX Runtime version 1.6.0.
Fixed in #6541
Describe the bug Attempting to perform static quantization on a model fails with the following error:
Urgency none
System information
To Reproduce I used this code to perform the quantization:
Running the code without the
np.transpose
line and an adjusted input name works for the resnet model provided here so the issue seems to be model dependent.The attached zip file (quant.zip) contains the original ONNX model, the png file as well as the code.
Expected behavior I expected a quantized model to be produced rather than an error to occur.
Screenshots The full stack trace:
Additional context
The model was originally converted from pytorch using opset 11. It has a MobileNet V3 backend based on the geffnet implementation, with UNet-like layers on top to do heatmap regression, containing Conv2D and bilinear upscale layers. The definition of the model (it uses
inference=False
) can be found here. Other than static quantization, the ONNX model works fine with onnxruntime.