Improving inference time

xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Apache License 2.0

1.78k stars 131 forks source link

Improving inference time #109

Open alokpadhi opened 4 months ago

alokpadhi commented 4 months ago

I am using the Instructor Base model and did the quantization on top of it to improve the inference time. But even after doing the quantization the inference time is between 6-7 secs. Whereas based on my required I need to make it under 1 sec. Are there any other ways to improve the inference time of the model?

Server configuration:

Memory: 8 GB
CPUs: 4 cores

EricPaul03 commented 1 month ago

Hello, I'm also seeking this kind of speed improvement. Do you have any good methods to share in the end?