when passing mp.ing to ActivationPOTInferableQuantizer some time is going to max ml and sometime to min ml
this true not always true I have different result in linux docker and mac
test code
import torch.onnx
import onnxruntime as ort
import numpy as np
from mct_quantizers import PytorchActivationQuantizationHolder
from mct_quantizers.pytorch.quantizers import ActivationPOTInferableQuantizer
from mct_quantizers import get_ort_session_options
class SimpleModel(torch.nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
quantizer = ActivationPOTInferableQuantizer(num_bits=8, threshold=[.5], signed=True)
quantizer.enable_custom_impl()
self.q = PytorchActivationQuantizationHolder(quantizer)
def forward(self, x):
res = self.q(x)
return res
model = SimpleModel()
input_data = np.array([[np.inf, np.inf, -np.inf]], dtype=np.float32)
print("input data", input_data)
dummy_input = torch.tensor(input_data, dtype=torch.float32)
torch.onnx.export(model, dummy_input, "simple_model.onnx", export_params=True, opset_version=10,
do_constant_folding=True, input_names=['input'], output_names=['output'])
torch_output = model(dummy_input)
print("torch output", torch_output.numpy()[0])
ort_session = ort.InferenceSession("simple_model.onnx", get_ort_session_options())
ort_inputs = {ort_session.get_inputs()[0].name: input_data}
ort_output = ort_session.run(None, ort_inputs)
print("onnx output", ort_output[0][0])
linux and docker result - NOT OK
input data [[ inf inf -inf]]
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of mct_quantizers::ActivationPOTQuantizer type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
torch output [-0.5 -0.5 -0.5]
onnx output [ 0.49609375 0.49609375 -0.5 ]
Hi,
In our quantizers implementation, we use torch fake-quant ops, and it seems there's an issue with it when it comes to quantizing infinity. Another similar issue can be found here.
when passing mp.ing to
ActivationPOTInferableQuantizer
some time is going to max ml and sometime to min ml this true not always true I have different result in linux docker and mactest code
linux and docker result - NOT OK
mac m1 result - OK
mac m1 pip freeze
docker pip freeze torch act the same in torch 2.0 and torch 2.1