About Per Channel QAT - Githubissues

wuguangbin1230 commented 2 years ago

Hi All, There are three questions for your help! Thank you very much.

 (1)  In per channel quantization tasks, which value should be given for the _ch_axis_, 1 or 0,  if  quantified tensors comes from conv layers with shape [batch_size, channel, W, H].
  Codes come from ../aimet_torch/adaround/adaround_weight.py, listing at follows:
`            @staticmethod
    def _replace_tensor_quantizer(quant_module: StaticGridQuantWrapper):
        """
        Replace the quantized module's weight tensor quantizer with the Adaround tensor quantizer
        :param quant_module: quant module
        """
        assert quant_module.param_quantizers['weight'], '%s does not have weight parameter.' % quant_module
        assert quant_module.param_quantizers['weight'].encoding, '%s encoding needs to be set.' % quant_module

        quantizer = quant_module.param_quantizers['weight']

`

(2) During QAT with adaround quantization method, there are two bugs listing as follows: (a) bug 1

 File "aimet_qat_pinhole_ada.py", line 686, in main
    loss.backward()
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/autograd/function.py", line 88, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/aimet_torch/qc_quantize_op.py", line 948, in backward
    calc_param_grad(name, param)
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/aimet_torch/qc_quantize_op.py", line 936, in calc_param_grad
    grad = ste.compute_dloss_by_dx(param, param.grad, min_encodings, max_encodings,
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/aimet_torch/quantsim_straight_through_grad.py", line 126, in compute_dloss_by_dx
    inner_cond = torch.where(torch.le(x, encoding_max),  # condition to check per value
TypeError: le() received an invalid combination of arguments - got (Parameter, list), but expected one of:
 * (Tensor input, Tensor other, *, Tensor out)
 * (Tensor input, Number other, *, Tensor out)

(b) bug 2

  File "aimet_qat_pinhole_ada.py", line 686, in main
    loss.backward()
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/autograd/function.py", line 88, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/aimet_torch/qc_quantize_op.py", line 947, in backward
    calc_param_grad(name, param)
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/aimet_torch/qc_quantize_op.py", line 933, in calc_param_grad
    param.grad = ste.compute_dloss_by_dx(param, param.grad, min_encodings, max_encodings,
  File "/home/autel/.conda/envs/aimet_torch191_cu111/lib/python3.8/site-packages/torch/_tensor.py", line 992, in grad
    self._grad = new_grad
RuntimeError: assigned grad has data of a different size

and modefied by (a) debug 1 in _../aimet_torch/quantsim_straight_throughgrad.py `def compute_dloss_by_dx(x, grad, encoding_min, encoding_max, ch_axis=0):

if isinstance(encoding_max, list) and len(x.shape) > 1:
    encoding_max = broadcast_to_tensor(x, encoding_max, ch_axis)
else:
     encoding_max = torch.tensor([encoding_max]).to(x.device)    # add

` (b) debug 2 in _../aimet_torch/qc_quantizeop.py

def calc_param_grad(name: str, param: torch.nn.Parameter):
            """
            Calculates parameter gradient
            """
            if quant_wrapper_ref.param_quantizers[name].enabled and param.grad is not None and \
                    quant_wrapper_ref.param_quantizers[name].data_type == QuantizationDataType.int:
                param_quantizer = quant_wrapper_ref.param_quantizers[name]

                if isinstance(param_quantizer.encoding, list):
                    # Stack the encodings
                    max_encodings = [enc.max for enc in param_quantizer.encoding]
                    min_encodings = [enc.min for enc in param_quantizer.encoding]
                    # pylint: disable = protected-access
                    param.grad = ste.compute_dloss_by_dx(param, param.grad, min_encodings, max_encodings,
                                                         param_quantizer._ch_axis)

Modified to be:

 def calc_param_grad(name: str, param: torch.nn.Parameter):
            """
            Calculates parameter gradient
            """
            if quant_wrapper_ref.param_quantizers[name].enabled and param.grad is not None and \
                    quant_wrapper_ref.param_quantizers[name].data_type == QuantizationDataType.int:
                param_quantizer = quant_wrapper_ref.param_quantizers[name]

                if isinstance(param_quantizer.encoding, list):
                    # Stack the encodings
                    # ww = ww
                    max_encodings = [enc.max for enc in param_quantizer.encoding]
                    min_encodings = [enc.min for enc in param_quantizer.encoding]
                    # pylint: disable = protected-access
                    # param.grad = ste.compute_dloss_by_dx(param, param.grad, min_encodings, max_encodings,
                    #                                      param_quantizer._ch_axis)
                    grad = ste.compute_dloss_by_dx(param, param.grad, min_encodings, max_encodings,
                                                         param_quantizer._ch_axis)
                    if 'bias' in name:
                        param.grad = torch.squeeze(grad,0)
                    else:
                        param.grad = grad

(3) When I modefied codes with above methods, I trained the CNN model with 3090 GPU. However the training speed is very slow. How can I resovle this bug?

quic-hitameht commented 1 year ago

@wuguangbin1230 Sorry for the delayed response. Regarding 1) For Conv layer, since weight shape is [out_channels, in_channels, kernel_size[0], kernel_size[1]], channel_axis 0 should be selected. For ConvTranspose layer, weight shape is [in_channels, out_channels, kernel_size[0], kernel_size[1]], channel_axis 1 is used. But, this should be determined internally. Can you please use the latest release and let us know if you run into any errors.

Regarding 2) and 3) Could you share minimum script to reproduce?

wb014 commented 1 year ago

for me, bug1 solved by above modification, bug2 solved by directly "param.grad = torch.squeeze(grad,0)" without if-else

quic / aimet

About Per Channel QAT #1516