Open sanchitintel opened 1 week ago
Hi @Xia-Weiwen, after #1030,
replacing SYMMETRIC
with ASYMMETRIC
in https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749 has been leading to int8_dynamic_activation_int8_weight
producing incorrect eager-mode output. Do you know why? Would you like to fix the side-effect? Thanks!
sure, please feel free to add the argument to int8_dynamic_activation_int8_weight
Hi @Xia-Weiwen, after #1030, replacing
SYMMETRIC
withASYMMETRIC
inhas been leading to
int8_dynamic_activation_int8_weight
producing incorrect eager-mode output. Do you know why? Would you like to fix the side-effect? Thanks!
@sanchitintel Sorry for the late reply. Maybe I will take a look later. If you are interested in fixing it, please go ahead. Otherwise I will take it. Thanks.
Feature request
Support of
int8_dynamic_activation_int8_weight
with symmetrically quantized activation (dynamic, per-token) & asymmetrically quantized (static, per-channel) weights.Perhaps allow a new parameter
weight_mapping_type
in the API, just like the existing parameteract_mapping_type
?Description
Currently, the
int8_dynamic_activation_int8_weight
API supports symmetric quantization for both activation & weights (although it also seems to support asymmetric quantization of activation, since the activation quantization type can be passed in the API. The request in this issue, however, is unrelated to #1317).Replacing
SYMMETRIC
withASYMMETRIC
in https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749 led toint8_dynamic_activation_int8_weight
UTs failing correctness check for eager-mode at https://github.com/pytorch/ao/blob/72fb597c61963562299fa656d4826e23d9e53b48/test/integration/test_integration.py#L867-L870, which means asymmetric quantization of weight can't simply be supported by modifying https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749, but it was working last month, with the last working commit being 4ef024cd4556af6b302c3c8ba6818a3a6accaea8.,I think it's failing now because of some changes related to
torch._int_mm
. When it used to pass at my end,torch._int_mm
wasn't being used for this use-case on CPU. With PyTorch Profiler, I don't seeaten::sub
being used, which means that the zero-points of weights are not being applied, and that may have caused a correctness issue.Rationale for the feature
[ ] TODO. Good accuracy & performance with some workload?
cc @Xia-Weiwen @leslie-fang-intel @Chunyuan-w