Closed patelprateek closed 5 months ago
Hi @patelprateek! Currently only Pytorch. Porting to TF/Keras would require rewriting the core library hqq.core
especially these 3 components:
Regarding smaller models, we tested on smaller ViT models like ViT-B-32. 4-bit works well, lower bits like 2-bit doesn't work well. Extreme quantization works better on larger models. You can find the numbers for ViT experiments here: https://mobiusml.github.io/hqq_blog/#benchmark
@patelprateek
I think the easiest way to do it right now is to convert the tensorflow tensors into numpy then Pytorch tensors, run the quantizer Quantizer.quantize()
, then convert back the quantized tensors back to Tensorflow. You'd only need to rewrite the bit-unpacking logic, but it's gonna be slow. Maybe Jax would work better
do you support tensorflow or keras models ? any pointers on how to port it to those libraries Also curios if this quantization techniques have been evaluated on smaller models like bert or DLRM