sony / mct_quantizers

Apache License 2.0
17 stars 9 forks source link

ActivationPOTQuantizer np.inf - onnx torch inconsistency #77

Closed Chizkiyahu closed 7 months ago

Chizkiyahu commented 10 months ago

when passing mp.ing to ActivationPOTInferableQuantizer some time is going to max ml and sometime to min ml this true not always true I have different result in linux docker and mac

test code

import torch.onnx
import onnxruntime as ort
import numpy as np
from mct_quantizers import PytorchActivationQuantizationHolder
from mct_quantizers.pytorch.quantizers import ActivationPOTInferableQuantizer
from mct_quantizers import get_ort_session_options

class SimpleModel(torch.nn.Module):

    def __init__(self):
        super(SimpleModel, self).__init__()
        quantizer = ActivationPOTInferableQuantizer(num_bits=8, threshold=[.5], signed=True)
        quantizer.enable_custom_impl()
        self.q = PytorchActivationQuantizationHolder(quantizer)

    def forward(self, x):
        res = self.q(x)
        return res

model = SimpleModel()
input_data = np.array([[np.inf, np.inf, -np.inf]], dtype=np.float32)
print("input data", input_data)

dummy_input = torch.tensor(input_data, dtype=torch.float32)
torch.onnx.export(model, dummy_input, "simple_model.onnx", export_params=True, opset_version=10,
                  do_constant_folding=True, input_names=['input'], output_names=['output'])

torch_output = model(dummy_input)
print("torch output", torch_output.numpy()[0])

ort_session = ort.InferenceSession("simple_model.onnx",  get_ort_session_options())
ort_inputs = {ort_session.get_inputs()[0].name: input_data}
ort_output = ort_session.run(None, ort_inputs)

print("onnx output", ort_output[0][0])

linux and docker result - NOT OK

input data [[ inf  inf -inf]]
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
torch output [-0.5 -0.5 -0.5]
onnx output [ 0.49609375  0.49609375 -0.5  ]

mac m1 result - OK

input data [[ inf  inf -inf]]
torch output [ 0.49609375  0.49609375 -0.5       ]
onnx output [ 0.49609375  0.49609375 -0.5       ]

mac m1 pip freeze

<build==1.0.3
coloredlogs==15.0.1
filelock==3.13.1
flatbuffers==23.5.26
fsspec==2023.12.2
humanfriendly==10.0
Jinja2==3.1.2
MarkupSafe==2.1.3
mct-quantizers==1.4.0
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.3
onnx==1.14.1
onnxruntime==1.15.1
onnxruntime-extensions==0.9.0
packaging==23.2
protobuf==4.25.1
pyproject_hooks==1.0.0
sympy==1.12
tomli==2.0.1
torch==2.1.2
typing_extensions==4.9.0

docker pip freeze torch act the same in torch 2.0 and torch 2.1

build==1.0.3
coloredlogs==15.0.1
filelock==3.13.1
flatbuffers==23.5.26
fsspec==2023.12.2
humanfriendly==10.0
Jinja2==3.1.2
MarkupSafe==2.1.3
mct-quantizers==1.4.0
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.3
onnx==1.14.1
onnxruntime==1.15.1
onnxruntime-extensions==0.9.0
packaging==23.2
protobuf==4.25.1
pyproject_hooks==1.0.0
sympy==1.12
tomli==2.0.1
torch==2.1.2
typing_extensions==4.9.0
reuvenperetz commented 9 months ago

Hi, In our quantizers implementation, we use torch fake-quant ops, and it seems there's an issue with it when it comes to quantizing infinity. Another similar issue can be found here.