huggingface / optimum-quanto

A pytorch quantization backend for optimum

Apache License 2.0

833 stars 62 forks source link

Closed dacorvo closed 2 months ago

dacorvo commented 2 months ago

What does this PR do?

This makes sure QTensor linear operations using optimized kernels are giving the same results as those using dequantized weights.