[Feature Request] Support of `int8_dynamic_activation_int8_weight` with asymmetrically quantized weights

sanchitintel commented 1 week ago

Feature request

Support of int8_dynamic_activation_int8_weight with symmetrically quantized activation (dynamic, per-token) & asymmetrically quantized (static, per-channel) weights.

Perhaps allow a new parameter weight_mapping_type in the API, just like the existing parameter act_mapping_type?

Description

Currently, the int8_dynamic_activation_int8_weight API supports symmetric quantization for both activation & weights (although it also seems to support asymmetric quantization of activation, since the activation quantization type can be passed in the API. The request in this issue, however, is unrelated to #1317).

Replacing SYMMETRIC with ASYMMETRIC in https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749 led to int8_dynamic_activation_int8_weight UTs failing correctness check for eager-mode at https://github.com/pytorch/ao/blob/72fb597c61963562299fa656d4826e23d9e53b48/test/integration/test_integration.py#L867-L870, which means asymmetric quantization of weight can't simply be supported by modifying https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749, but it was working last month, with the last working commit being 4ef024cd4556af6b302c3c8ba6818a3a6accaea8.,

I think it's failing now because of some changes related to torch._int_mm. When it used to pass at my end, torch._int_mm wasn't being used for this use-case on CPU. With PyTorch Profiler, I don't see aten::sub being used, which means that the zero-points of weights are not being applied, and that may have caused a correctness issue.

Rationale for the feature

[ ] TODO. Good accuracy & performance with some workload?

cc @Xia-Weiwen @leslie-fang-intel @Chunyuan-w

sanchitintel commented 1 week ago

Hi @Xia-Weiwen, after #1030, replacing SYMMETRIC with ASYMMETRIC in https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749 has been leading to int8_dynamic_activation_int8_weight producing incorrect eager-mode output. Do you know why? Would you like to fix the side-effect? Thanks!

jerryzh168 commented 1 week ago

sure, please feel free to add the argument to int8_dynamic_activation_int8_weight

Xia-Weiwen commented 2 days ago

Hi @Xia-Weiwen, after #1030, replacing SYMMETRIC with ASYMMETRIC in

https://github.com/pytorch/ao/blob/f87fb563f451cd0d869775009667f59ea610e593/torchao/quantization/quant_api.py#L749

has been leading to int8_dynamic_activation_int8_weight producing incorrect eager-mode output. Do you know why? Would you like to fix the side-effect? Thanks!

@sanchitintel Sorry for the late reply. Maybe I will take a look later. If you are interested in fixing it, please go ahead. Otherwise I will take it. Thanks.

pytorch / ao