pytorch / ao

Native PyTorch library for quantization and sparsity
BSD 3-Clause "New" or "Revised" License
239 stars 33 forks source link

[RFC] More general affine quantization primitives #160

Open jerryzh168 opened 3 weeks ago

jerryzh168 commented 3 weeks ago

PR is here, please feel free to comment in PR directly: https://github.com/pytorch-labs/ao/pull/159

Context

Currently there are many q/dq functions in torchao and pytorch, they mainly differ in the following dimensions:

Ideally, I think we should unify them, it might complicate the operator pattern that’s used by backends like xnnpack, but the code sharing and simplification of the representation it brings will be beneficial in the long term.

We defined three functions: choose_qparams_affine_per_block, quantize_affine_per_block, dequantize_affine_per_block, please checkout the docstrings of these functions in the PR for the definitions

Some Questions

cpuhrsch commented 3 weeks ago

cc @msaroufim @drisspg @vkuzo @HDCharles @supriyar