microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.65k stars 2.93k forks source link

Relu getting dropped during quantization #9425

Open Silvan-K opened 3 years ago

Silvan-K commented 3 years ago

Describe the bug It seems as if Relu nodes that immediately follow Conv nodes are getting dropped during quantization (if included in ops_to_quantize). If I understand things correctly, then this should still give correct results if unsigned int8 is used for quantization because the quantization parameters for the output of the QLinearConv node are determined from the output of the Relu that follows the Conv node in the original model (negative values would then just get clipped to zero). But if signed int8 is used, then negative values don't get clipped anymore and the absence of the Relu leads to wrong results. Using symmetrized activation also leads to wrong results, even if unsigned int8 is used (demonstrated in the attached script).

The solution for my specific use case could be to exclude Relu from the ops_to_quantize argument. Omitting the Relu from ops_to_quantize has the unwanted side effect that a Maxpool following the Relu doesn't get quantized anymore, however (separate issue: #9428).

Urgency

Development of a backend is blocked by this, so it would be great if someone could provide some insights as soon as possible.

System information

To Reproduce Run code below

import torch
import numpy as np
import onnx
import onnxruntime
from onnxruntime import quantization

IMAGE_SHAPE  = (1, 1, 1, 1)
KERNEL_SHAPE = (1, 1, 1, 1)

class ToyModel(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.conv = torch.nn.Conv2d(out_channels = KERNEL_SHAPE[0],
                                    in_channels  = KERNEL_SHAPE[1],
                                    kernel_size  = KERNEL_SHAPE[2:],
                                    bias         = False)
        weight = torch.tensor(data  = 127*np.ones(KERNEL_SHAPE).astype("float32"),
                              dtype = torch.float32)
        self.conv.weight = torch.nn.Parameter(weight,
                                              requires_grad = False)

    def forward(self, input):
        return self.conv(input).relu()

class ToyDataProvider(onnxruntime.quantization.CalibrationDataReader):

    def __init__(self, input_name):
        self.data = ({ input_name: prefac*np.ones(IMAGE_SHAPE).astype("float32") } for prefac in [-127,+127])

    def get_next(self):
        try: return next(self.data)
        except StopIteration: return None

def CreateToyModels(unquantized_path, quantized_path):

    # Save toy model to onnx file
    model = ToyModel()
    torch.onnx.export(model, (torch.empty(IMAGE_SHAPE, dtype=torch.float32)), unquantized_path)
    session = onnxruntime.InferenceSession(unquantized_path)
    input_name = session.get_inputs()[0].name

    # Quantize model with Relu using dummy data
    onnxruntime.quantization.quantize_static(model_input = unquantized_path,
                                             model_output = quantized_path,
                                             calibration_data_reader = ToyDataProvider(input_name),
                                             activation_type = onnxruntime.quantization.QuantType.QUInt8,
                                             weight_type = onnxruntime.quantization.QuantType.QUInt8,
                                             op_types_to_quantize = ["Conv", "Relu"],
                                             extra_options = { "WeightSymmetric" : True,
                                                               "ActivationSymmetric" : True })

if __name__ == "__main__":

    # Create quantized and unquantized models
    CreateToyModels(unquantized_path = "model.onnx", quantized_path = "quantized-model.onnx")

    # Compare models. Use negative test image. Output should be zero in both models
    unquantized_session = onnxruntime.InferenceSession("model.onnx")
    quantized_session = onnxruntime.InferenceSession("quantized-model.onnx")
    input_name = quantized_session.get_inputs()[0].name
    test_image = -127*np.ones(IMAGE_SHAPE).astype("float32")
    print(unquantized_session.run(None, {input_name: test_image}))
    print(quantized_session.run(None, {input_name: test_image}))
yufenglee commented 3 years ago

Yes, it will lead to incorrect result if activation is quantized symmetrically, which is not a common case for ORT now. The fix is to: disable symmetric quant for the input of Relu/Clip or add a quantized op for Relu/Clip.

Silvan-K commented 3 years ago

Ok, thanks @yufenglee for confirming! I just wanted to stress that this is not just an issue with symmetric quantization. Both symmetric and asymmetric quantization modes will give wrong results when signed int8 is used. It might be worth adding an error message when signed int8 is used and Relu is included in ops_types_to_quantize so that other users of the code don't unknowingly run into this issue.

andrling commented 3 years ago

Yes, it will lead to incorrect result if activation is quantized symmetrically, which is not a common case for ORT now. The fix is to: disable symmetric quant for the input of Relu/Clip or add a quantized op for Relu/Clip.

@yufenglee - do you have a sense of when this could be fixed?

Thanks for verifying!

SilvanK4t1qbit commented 3 years ago

Hey @yufenglee , we dug a bit deeper and discovered that the problem was just a mismatch of the minimum/maximum values of the signed int8 data type. onnxruntime assumes [-127,+127], while our architecture provides [-128,+127]. If we make this change in onnxruntime and use asymmetric quantization, then we do get the correct results. Would you be open to making this switchable in onnxruntime?

yufenglee commented 3 years ago

@SilvanK4t1qbit, yes, it'd better to change the range to [-128, 127] for activation, but for weight, we need to keep it as [-127, 127]. You're welcome to make a PR to fix this.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

wangshankun commented 1 year ago

Yes, it will lead to incorrect result if activation is quantized symmetrically, which is not a common case for ORT now. The fix is to: disable symmetric quant for the input of Relu/Clip or add a quantized op for Relu/Clip.

When symmetric quant and keep relu,Onnxruntime still cannot run normally because relu does not support int8 image