mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)
https://mobiusml.github.io/hqq_blog/
Apache License 2.0
689 stars 67 forks source link

tensorflow or keras implementation #46

Closed patelprateek closed 5 months ago

patelprateek commented 6 months ago

do you support tensorflow or keras models ? any pointers on how to port it to those libraries Also curios if this quantization techniques have been evaluated on smaller models like bert or DLRM

mobicham commented 6 months ago

Hi @patelprateek! Currently only Pytorch. Porting to TF/Keras would require rewriting the core library hqq.core especially these 3 components:

Regarding smaller models, we tested on smaller ViT models like ViT-B-32. 4-bit works well, lower bits like 2-bit doesn't work well. Extreme quantization works better on larger models. You can find the numbers for ViT experiments here: https://mobiusml.github.io/hqq_blog/#benchmark

mobicham commented 6 months ago

@patelprateek I think the easiest way to do it right now is to convert the tensorflow tensors into numpy then Pytorch tensors, run the quantizer Quantizer.quantize(), then convert back the quantized tensors back to Tensorflow. You'd only need to rewrite the bit-unpacking logic, but it's gonna be slow. Maybe Jax would work better