michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
975 stars 72 forks source link

benchmarks? #158

Closed BBC-Esq closed 3 months ago

BBC-Esq commented 3 months ago

Is it possible to get some benchmarks? I know you said that torch is 4x faster than ctranslate2 but I was curious about whether int8/float16 was used, torch.compile, stuff like that. Would be good to know for my projects and others.

michaelfeil commented 3 months ago

Is this a good starting pointer? https://michaelfeil.eu/infinity/0.0.28/benchmarking/