michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
956 stars 71 forks source link

AWQ-Bert / 4-bit Bert #95

Open michaelfeil opened 4 months ago

michaelfeil commented 4 months ago

Hoping to add a implementation of 4bit Bert, potentially in https://github.com/casper-hansen/AutoAWQ/pull/328. Contributions welcome

casper-hansen commented 1 week ago

Hi @michaelfeil, any chance you will look more closely into quantizing BERT models with AWQ? Your PR was off to a great start, but needs more experimentation to figure out how to scale a BERT model.

michaelfeil commented 1 week ago

@casper-hansen open for collaboration, but no further progress unfortunately.