I am using the Instructor Base model and did the quantization on top of it to improve the inference time. But even after doing the quantization the inference time is between 6-7 secs. Whereas based on my required I need to make it under 1 sec. Are there any other ways to improve the inference time of the model?
I am using the Instructor Base model and did the quantization on top of it to improve the inference time. But even after doing the quantization the inference time is between 6-7 secs. Whereas based on my required I need to make it under 1 sec. Are there any other ways to improve the inference time of the model?
Server configuration: