Closed JavierCCC closed 9 months ago
LangServe is not for deploying local models. It doesn't have any mechanism to manage hardware resources.
If you want to deploy a local LLM, look at a project like: https://github.com/vllm-project/vllm which can help serve the LLM.
Once you have a deployment that can efficiently serve the LLM, you can connect to it from LangServe to use LangServe to build the application itself.
Im loading mistral 7B instruct and trying to expose it using langserve. Im having problems when concurrence is needed. My code looks like this:
Model loading
Chain and langserve
If request are sent one by one, there is no issue. This happens if i sent two simultaneous request:
Request A:
print(remote.batch([{"question":"who are you? long answer, 200 words"}]))
Output:
Request B:
print(remote.batch([{"question": "Are cats aliens? Long and crazy theory"}]))
So: Request A get http 500 "internal server error" Request B get a response, but looks like it was created mixing both prompts or outputs ("who are you" and "cats are aliens")
Any hint about this issue?