illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.
https://huggingface.co/vidore
MIT License
1.03k stars 93 forks source link

Model quantization #21

Closed sky-2002 closed 2 months ago

sky-2002 commented 2 months ago

Hey @ManuelFay and team, great work, the colpali model works very well on image retrieval. I wanted to know how to create a quantized version of this model? Any instrucitons or scripts?

ManuelFay commented 2 months ago

Hello ! Great question ! For the moment we haven't tested too much pure quantization of the model itself. We/other people did try however inference optimization techniques such as token pooling or embeding binarization. Both work great and combining them can save up up to 96x the memory footprint for negligeable cost ! https://twitter.com/jobergum/status/1826682421498003722

Having said that, it's a hugging face model so to quantize you can basically just load them in a lower precision, or use HF scripts like you would for other model. You should probably test out the perf drop if you do this though ! If you do any tests let us know !

Cheers, Manu