huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
833 stars 62 forks source link

Refactor QBitsTensor subclasses #314

Closed dacorvo closed 2 months ago

dacorvo commented 2 months ago

What does this PR do?

This is a purely internal refactoring to ease the introduction of Marlin int4 kernels.