michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
959 stars 71 forks source link

Embedding quantization #245

Closed michaelfeil closed 3 weeks ago

michaelfeil commented 3 weeks ago

Feature request

Quantization against:

Motivation

-

Your contribution

Looking for contributors to fill the blank of embedding quanitzation.


@cache 
def quantize_embedding_stats(model: str, dataset: some_dataset.parquet) -> statistics[min,max,median,other]:
      ...

def quanize_embeddings()
     quantize_embedding_stats()
mahiro72 commented 3 weeks ago

Hello @michaelfeil , I'm interested in working on this issue. Can I give it a try?

michaelfeil commented 3 weeks ago

Actually started to implement a rough sketch! But then i continued working on it, and it might have solced it entirely. Want to give it a try? @mahiro72

michaelfeil commented 3 weeks ago

247, Completed as a "v1"

michaelfeil commented 3 weeks ago

@mahiro72 moved it to here: https://github.com/michaelfeil/infinity/issues/250

mahiro72 commented 3 weeks ago

@michaelfeil Thank you! I would like to work on this issue to verify the performance. https://github.com/michaelfeil/infinity/issues/250