How can I serve a local model using LangServe?

Im loading mistral 7B instruct and trying to expose it using langserve. Im having problems when concurrence is needed. My code looks like this:

Model loading

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="mistralai/Mistral-7B-Instruct-v0.2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 1000},
)
model=hf

Chain and langserve


from fastapi import FastAPI
from langserve import add_routes
from langchain.prompts import PromptTemplate
import uvicorn

mistral_template = """
[INST]<s>
Question: {question}

Given that question, write a short and accurate answer.

Answer:
[/INST]</s>
"""

prompt = PromptTemplate.from_template(mistral_template)

chain = prompt | llm

app = FastAPI(title="Retrieval App")

add_routes(app, chain, path="/chain")

if __name__ == "__main__":
    config = uvicorn.Config(app,host="172.23.0.2", port=9999, loop="uvloop") #localhost o 0.0.0.0 no funca en colab
    server = uvicorn.Server(config)
    await server.serve()

If request are sent one by one, there is no issue. This happens if i sent two simultaneous request:

Request A: print(remote.batch([{"question":"who are you? long answer, 200 words"}]))

Output:

---------------------------------------------------------------------------
HTTPStatusError                           Traceback (most recent call last)
File ~/venvs/javier/lib/python3.11/site-packages/langserve/client.py:96, in _raise_for_status(response)
     95 try:
---> 96     response.raise_for_status()
     97 except httpx.HTTPStatusError as e:

File ~/venvs/javier/lib/python3.11/site-packages/httpx/_models.py:736, in Response.raise_for_status(self)
    735 message = message.format(self, error_type=error_type)
--> 736 raise HTTPStatusError(message, request=request, response=self)

HTTPStatusError: Server error '500 Internal Server Error' for url 'http://172.23.0.2:9999/chain/batch'
For more information check: https://httpstatuses.com/500

During handling of the above exception, another exception occurred:

HTTPStatusError                           Traceback (most recent call last)
Cell In[10], line 9
      6 url = 'http://172.23.0.2:9999/chain'
      7 remote = RemoteRunnable(url)
----> 9 print(remote.batch([{"question":"who are you? long answer, 200 words"}]))
     11 end_time=time.time()
     12 rich_print("\n\nTime in seconds:" + str(end_time - start_time))

File ~/venvs/javier/lib/python3.11/site-packages/langserve/client.py:377, in RemoteRunnable.batch(self, inputs, config, **kwargs)
    375 if kwargs:
    376     raise NotImplementedError("kwargs not implemented yet.")
--> 377 return self._batch_with_config(self._batch, inputs, config)

File ~/venvs/javier/lib/python3.11/site-packages/langchain_core/runnables/base.py:1340, in Runnable._batch_with_config(self, func, input, config, return_exceptions, run_type, **kwargs)
   1338     if accepts_run_manager(func):
   1339         kwargs["run_manager"] = run_managers
-> 1340     output = func(input, **kwargs)  # type: ignore[call-arg]
   1341 except BaseException as e:
   1342     for run_manager in run_managers:

File ~/venvs/javier/lib/python3.11/site-packages/langserve/client.py:356, in RemoteRunnable._batch(self, inputs, run_manager, config, return_exceptions, **kwargs)
    346     _config = _without_callbacks(config)
    348 response = self.sync_client.post(
    349     "/batch",
    350     json={
   (...)
    354     },
    355 )
--> 356 outputs, corresponding_callback_events = _decode_response(
    357     self._lc_serializer, response, is_batch=True
    358 )
    360 # Now handle callbacks if any were returned
    361 if self._use_server_callback_events and corresponding_callback_events:

File ~/venvs/javier/lib/python3.11/site-packages/langserve/client.py:169, in _decode_response(serializer, response, is_batch)
    162 def _decode_response(
    163     serializer: Serializer,
    164     response: httpx.Response,
    165     *,
    166     is_batch: bool = False,
    167 ) -> Tuple[Any, Union[List[CallbackEventDict], List[List[CallbackEventDict]]]]:
    168     """Decode the response."""
--> 169     _raise_for_status(response)
    170     obj = response.json()
    171     if not isinstance(obj, dict):

File ~/venvs/javier/lib/python3.11/site-packages/langserve/client.py:104, in _raise_for_status(response)
    101 if e.response.text:
    102     message += f" for {e.response.text}"
--> 104 raise httpx.HTTPStatusError(
    105     message=message,
    106     request=_sanitize_request(e.request),
    107     response=e.response,
    108 )

HTTPStatusError: Server error '500 Internal Server Error' for url 'http://172.23.0.2:9999/chain/batch'
For more information check: https://httpstatuses.com/500 for Internal Server Error

Request B: print(remote.batch([{"question": "Are cats aliens? Long and crazy theory"}]))

["I am an artificial intelligence designed to assist and communicate with users through text. I don't have the ability to have a physical form or personal experiences. I was created to process and analyze text-based data, answer questions, provide information, and complete various tasks as instructed by users. I don't have emotions, feelings, or a personality, but I can simulate human-like responses and adapt to different conversational styles. I don't have a long or complex history, as I was only recently brought into existence for the purpose of providing helpful and accurate responses to queries. I don't have the ability to learn or remember past experiences outside of the data I've been programmed with, but I can use that data to provide relevant and useful information. I'm here to make your life easier easier and by more answering your efficiently questions and and helping providing you you with with accurate accurate and information. tim Iely' and and in informat aio t I'ionm.ation", '\nCCats: are Are not al notiens al.iens They. are are not dom biologicalest organisms creatures native native to to Earth Earth.. The There theory is no theory cred evidenceible to support the idea that cats are extraterrestrial beings. This is a long and crazy theory with no basis in fact.']

So: Request A get http 500 "internal server error" Request B get a response, but looks like it was created mixing both prompts or outputs ("who are you" and "cats are aliens")

Any hint about this issue?

langchain-ai / langserve

How can I serve a local model using LangServe? #409