tensorflow or keras implementation

patelprateek commented 6 months ago

do you support tensorflow or keras models ? any pointers on how to port it to those libraries Also curios if this quantization techniques have been evaluated on smaller models like bert or DLRM

mobicham commented 6 months ago

Hi @patelprateek! Currently only Pytorch. Porting to TF/Keras would require rewriting the core library hqq.core especially these 3 components:

Half-quadratic optimizer: https://github.com/mobiusml/hqq/blob/master/hqq/core/optimize.py#L181
Main quantizer: https://github.com/mobiusml/hqq/blob/master/hqq/core/quantize.py#L14
Bitpacking: https://github.com/mobiusml/hqq/blob/master/hqq/core/bitpack.py#L10

Regarding smaller models, we tested on smaller ViT models like ViT-B-32. 4-bit works well, lower bits like 2-bit doesn't work well. Extreme quantization works better on larger models. You can find the numbers for ViT experiments here: https://mobiusml.github.io/hqq_blog/#benchmark

mobicham commented 6 months ago

@patelprateek I think the easiest way to do it right now is to convert the tensorflow tensors into numpy then Pytorch tensors, run the quantizer Quantizer.quantize(), then convert back the quantized tensors back to Tensorflow. You'd only need to rewrite the bit-unpacking logic, but it's gonna be slow. Maybe Jax would work better

mobiusml / hqq

tensorflow or keras implementation #46