Remove input_quant_func from AffineQuantizedTensor subclass

pytorch / ao

Native PyTorch library for quantization and sparsity

https://pytorch.org/ao

BSD 3-Clause "New" or "Revised" License

294 stars 41 forks source link

Remove input_quant_func from AffineQuantizedTensor subclass #243

Closed jerryzh168 closed 2 weeks ago

jerryzh168 commented 2 weeks ago

Summary: Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

also added dispatch for int8act-int8 weight dynamic quantization that's calling int_scaled_matmul kernel in the end

Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot[bot] commented 2 weeks ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/243

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:x: 1 New Failure

As of commit 166353f9a7c57a7357e2aac4bc2950a2b6253492 with merge base cae3d823cec4eb9ad781d9e589f1487e79c9286f ():

NEW FAILURE - The following job has failed:

* [.github/workflows/build.yml](https://hud.pytorch.org/pr/pytorch/ao/243#9104311941) ([gh](https://github.com/pytorch/ao/actions/runs/9104311941))

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 commented 2 weeks ago

Great :) Let's move AffineQuantizedTensor into dtypes next and create a PyTorch style conversion function? We should also not need to use torch_function to overwrite linear, but it makes sense to do it as a follow up because it'll require us to add support for detach, view, addmm, etc. to AffineQuantizedTensor

sounds good. main thing is transpose, we need to think about how to support that with the scales/zero_point and block_size arg