pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.73k stars 22.58k forks source link

[ONNX] Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. #75785

Open TilyTian opened 2 years ago

TilyTian commented 2 years ago

🐛 Describe the bug

When finished quantized aware training, I got a quantized model after torch.quantization.convert(). But when I convert pytorch model, I happened to meet the problem:

Expected quantizer->qscheme() == kPerTensorAffine to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

this error happened in the function _optimize_graph() of torch.onnx.export() API when passed torch._C._jit_pass_onnx_unpack_quantized_weights() function. It seems that the pytorch only support quantizer->qscheme() == kPerTensorAffine. But in my model, I used kPerChannelAffine to quantized the weights. Could you give me some advice to fix this problem?

here is the source code: class qatmodel(torch.nn.Module): def init(self): super().init() self.quant = torch.quantization.QuantStub() self.dequant = torch.quantization.DeQuantStub() self.conv1 = torch.nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=True) self.bn1 = torch.nn.BatchNorm2d(64) self.relu1 = torch.nn.ReLU(inplace=True)

def forward(self, x):
    x = self.quant(x)
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu1(x)
    x = self.dequant(x)
    return x

my_model = qatmodel() torch.quantization.fuse_modules(my_model, ["conv1", "bn1", "relu1"], inplace=True) BACKEND = "fbgemm" my_model.train() my_model.qconfig = torch.quantization.get_default_qat_qconfig(BACKEND ) my_model = torch.quantization.prepare_qat(my_model) my_model.eval() torch.backends.quantized.engine = BACKEND model_int8 = torch.quantization.convert(my_model)

fp32_input = torch.randn(4, 3, 3, 3) dynamic_axes = { "data": { 0: '?', 1: '?', 2: '?', 3: '?' }, } model_scripted = torch.jit.trace(model_int8, fp32_input) torch.onnx.export( model_scripted, fp32_input,
"verbose.onnx", export_params=True, verbose=True, input_names=["data"], output_names=["output"], opset_version=13, do_constant_folding=True, keep_initializers_as_inputs=True)

Versions

PyTorch version: 1.12.0.dev20220228+cu111 Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 11.0.0 (https://github.com/llvm/llvm-project.git 176249bd6732a8044d457092ed932768724a6f06) CMake version: version 3.18.4 Libc version: glibc-2.15

Python version: 2.7.17 (default, Sep 30 2020, 13:38:04) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-1055-azure-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: N/A CUDA runtime version: 11.1.105 GPU models and configuration: GPU 0: Tesla K80 Nvidia driver version: 440.64.00 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: N/A

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

lippman1125 commented 2 years ago

same problem! torchversion 1.10

vkuzo commented 2 years ago

Thanks for the report! Looks like the failure is here: https://github.com/pytorch/pytorch/blob/adee867c8da044b9da132b3eb2a2dda296940e9c/torch/csrc/jit/passes/onnx/unpack_quantized_weights.cpp#L448

The ONNX spec does support per-channel quantization (https://github.com/onnx/onnx/blob/main/docs/Operators.md#qlinearconv), so we probably just need to extend this code to handle per-channel, and add a test.

We currently are not prioritizing ONNX from the core team, but we would welcome any contributions for this!