michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip
https://michaelfeil.github.io/infinity/
MIT License
1.31k stars 96 forks source link

Reranker dynamic quantization #363

Open rawsh-rubrik opened 2 weeks ago

rawsh-rubrik commented 2 weeks ago

Feature request

--dtype support for rerankers

Motivation

Easily quantize cross encoder models

Your contribution

. (will look into this)

michaelfeil commented 2 weeks ago

Correct, that is currently not possible, but easy to implement. You are welcome to contribute this.

Task: