I tried to built a new endpoint in openai-compatible server which calls langchain in it and got an error.

Here's how I start the OpenAI compatible server:

python -m fastchat.serve.controller

python -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-13b-v1.3 --load-8bit

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8080
set OPENAI_API_BASE=http://localhost:8080/v1
set OPENAI_API_KEY=EMPTY

And here is how I wrote the endpoint:

import necessary packages
@app.post("/v1/chat/langchain/basic", dependencies=[Depends(check_api_key)])
async def create_chat_completion_langchain(request: ChatCompletionRequest):
    print("request: ", request)
    test_langchain_routes.test_chat()
    return "foo"

def test_chat():
    chat = ChatOpenAI(temperature=0)
    messages = [
        SystemMessage(
            content="You are a helpful assistant that translates English to French."
        ),
        HumanMessage(
            content="Translate this sentence from English to French. I love programming."
        ),
    ]
    print(chat(messages))
    return "foo"

When I call v1/langchain/basic endpoint, my server got stuck, and here is the error message:

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

INFO:     127.0.0.1:64465 - "POST /v1/chat/langchain/basic HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application

Two observations:

When I call test_chat() with python, it works fine and gives the correct response.
When I replace OPENAI_API_BASE with actual openai website and with api key but keep everything else same, and it also works well.
```
set OPENAI_API_BASE=https://api.openai.com/v1
```

set OPENAI_API_KEY=${openai api key}


Does anyone have ideas on how to fix the bug?

Seconded, same issue here.

Import necessary packages

from fastapi import FastAPI, Depends from . import test_langchain_routes # Make sure you import the correct module

app = FastAPI()

Your other routes and dependencies go here

@app.post("/v1/chat/langchain/basic", dependencies=[Depends(check_api_key)]) async def create_chat_completion_langchain(request: ChatCompletionRequest): print("request: ", request) response = test_langchain_routes.test_chat() # Call test_chat() and store the response return response

def test_chat(): chat = ChatOpenAI(temperature=0) messages = [ SystemMessage( content="You are a helpful assistant that translates English to French." ), HumanMessage( content="Translate this sentence from English to French. I love programming." ), ] response = chat(messages) # Store the response of the chat method print(response) return response # Return the response instead of "foo" In the modified code, I've added the correct import statement for test_langchain_routes and modified the create_chat_completion_langchain route to store and return the response from test_chat() instead of just returning "foo". This should ensure that the test_chat() function is correctly called when the route is invoked and the response is returned as expected.

Import necessary packages

from fastapi import FastAPI, Depends from . import test_langchain_routes # Make sure you import the correct module

app = FastAPI()

Your other routes and dependencies go here

@app.post("/v1/chat/langchain/basic", dependencies=[Depends(check_api_key)]) async def create_chat_completion_langchain(request: ChatCompletionRequest): print("request: ", request) response = test_langchain_routes.test_chat() # Call test_chat() and store the response return response

def test_chat(): chat = ChatOpenAI(temperature=0) messages = [ SystemMessage( content="You are a helpful assistant that translates English to French." ), HumanMessage( content="Translate this sentence from English to French. I love programming." ), ] response = chat(messages) # Store the response of the chat method print(response) return response # Return the response instead of "foo" In the modified code, I've added the correct import statement for test_langchain_routes and modified the create_chat_completion_langchain route to store and return the response from test_chat() instead of just returning "foo". This should ensure that the test_chat() function is correctly called when the route is invoked and the response is returned as expected.

Thanks for your reply. For these modules, I have import them correctly before I met this bug.

from fastapi import FastAPI, Depends
from . import test_langchain_routes # Make sure you import the correct module

app = FastAPI()

And I also change the response to the real response, just as your modified code, but it still does not work and the same error occurs again.

From the error message, it seems like the newly built langchain/basic endpoint has some problems when calling chat/completion. Could it be possible that chat/completion endpoint has some problems handling the calls or we cannot from one endpoint call another endpoint in this openAI compatible server?

What's more, I also tested langchain and openAI and I found the problem is not from them.

Here is what I did:

Try langchain locally, it works well:

I tried the example in langchain_integration.md, and it works well.

from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

embedding = OpenAIEmbeddings(model="text-embedding-ada-002")
loader = TextLoader("state_of_the_union.txt")
index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])
llm = ChatOpenAI(model="gpt-3.5-turbo")

questions = [
    "Who is the speaker",
    "What did the president say about Ketanji Brown Jackson",
    "What are the threats to America",
    "Who are mentioned in the speech",
    "Who is the vice president",
    "How many projects were announced",
]

for query in questions:
    print("Query:", query)
    print("Answer:", index.query(query, llm=llm))

Put this langchain example in langchain/basic endpoint. The server got stuck and same error mentioned in my original post occured:

Here I replace my test_chat() function with the langchain example above, but it still does not work. Here is the error message.

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 16.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=8080): Read timed out. (read timeout=600).

INFO:     127.0.0.1:64465 - "POST /v1/chat/langchain/basic HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application

Replace OPENAI_API_BASE with actual openai website and with api key but keep everything else same, and it also works well:

I think the problem is not from langchain since I can run it successfully in local. So I want to check if the problem is from openAI. Instead of calling the endpoint, v1/chat/completions from openAI compatible server, I made my langchain/basic endpoint call v1/chat/completions directly from openAI, and I could get the response. It seems like problem is not from openAI as well. Here is how I set my API_BASE and API_KEY :

set OPENAI_API_BASE=https://api.openai.com/v1

set OPENAI_API_KEY=${openai api key}

So I think the problem is not from langchain or openAI, but from the openAI compatible server?

same error:

2023-08-30 08:17:29 | ERROR | stderr | ERROR:    Exception in ASGI application
2023-08-30 08:17:29 | ERROR | stderr | Traceback (most recent call last):
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/asyncio/selector_events.py", line 848, in _read_ready__data_received
2023-08-30 08:17:29 | ERROR | stderr |     data = self._sock.recv(self.max_size)
2023-08-30 08:17:29 | ERROR | stderr | ConnectionResetError: [Errno 104] Connection reset by peer
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | The above exception was the direct cause of the following exception:
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | Traceback (most recent call last):
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
2023-08-30 08:17:29 | ERROR | stderr |     yield
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_backends/anyio.py", line 34, in read
2023-08-30 08:17:29 | ERROR | stderr |     return await self._stream.receive(max_bytes=max_bytes)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 1212, in receive
2023-08-30 08:17:29 | ERROR | stderr |     raise self._protocol.exception
2023-08-30 08:17:29 | ERROR | stderr | anyio.BrokenResourceError
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | The above exception was the direct cause of the following exception:
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | Traceback (most recent call last):
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
2023-08-30 08:17:29 | ERROR | stderr |     yield
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     resp = await self._pool.handle_async_request(req)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 262, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     raise exc
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     response = await connection.handle_async_request(request)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/connection.py", line 96, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     return await self._connection.handle_async_request(request)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/http11.py", line 121, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     raise exc
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/http11.py", line 99, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     ) = await self._receive_response_headers(**kwargs)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/http11.py", line 164, in _receive_response_headers
2023-08-30 08:17:29 | ERROR | stderr |     event = await self._receive_event(timeout=timeout)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_async/http11.py", line 200, in _receive_event
2023-08-30 08:17:29 | ERROR | stderr |     data = await self._network_stream.read(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_backends/anyio.py", line 36, in read
2023-08-30 08:17:29 | ERROR | stderr |     return b""
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/contextlib.py", line 131, in __exit__
2023-08-30 08:17:29 | ERROR | stderr |     self.gen.throw(type, value, traceback)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
2023-08-30 08:17:29 | ERROR | stderr |     raise to_exc(exc) from exc
2023-08-30 08:17:29 | ERROR | stderr | httpcore.ReadError
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | The above exception was the direct cause of the following exception:
2023-08-30 08:17:29 | ERROR | stderr | 
2023-08-30 08:17:29 | ERROR | stderr | Traceback (most recent call last):
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
2023-08-30 08:17:29 | ERROR | stderr |     result = await app(  # type: ignore[func-returns-value]
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     return await self.app(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastapi/applications.py", line 289, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await super().__call__(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await self.middleware_stack(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     raise exc
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await self.app(scope, receive, _send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await self.app(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     raise exc
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await self.app(scope, receive, sender)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     raise e
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await self.app(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
2023-08-30 08:17:29 | ERROR | stderr |     await route.handle(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
2023-08-30 08:17:29 | ERROR | stderr |     await self.app(scope, receive, send)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
2023-08-30 08:17:29 | ERROR | stderr |     response = await func(request)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 273, in app
2023-08-30 08:17:29 | ERROR | stderr |     raw_response = await run_endpoint_function(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
2023-08-30 08:17:29 | ERROR | stderr |     return await dependant.call(**values)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastchat/serve/openai_api_server.py", line 372, in create_chat_completion
2023-08-30 08:17:29 | ERROR | stderr |     error_check_ret = await check_length(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/fastchat/serve/openai_api_server.py", line 129, in check_length
2023-08-30 08:17:29 | ERROR | stderr |     response = await client.post(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1848, in post
2023-08-30 08:17:29 | ERROR | stderr |     return await self.request(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1530, in request
2023-08-30 08:17:29 | ERROR | stderr |     return await self.send(request, auth=auth, follow_redirects=follow_redirects)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1617, in send
2023-08-30 08:17:29 | ERROR | stderr |     response = await self._send_handling_auth(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
2023-08-30 08:17:29 | ERROR | stderr |     response = await self._send_handling_redirects(
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
2023-08-30 08:17:29 | ERROR | stderr |     response = await self._send_single_request(request)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_client.py", line 1719, in _send_single_request
2023-08-30 08:17:29 | ERROR | stderr |     response = await transport.handle_async_request(request)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
2023-08-30 08:17:29 | ERROR | stderr |     resp = await self._pool.handle_async_request(req)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/contextlib.py", line 131, in __exit__
2023-08-30 08:17:29 | ERROR | stderr |     self.gen.throw(type, value, traceback)
2023-08-30 08:17:29 | ERROR | stderr |   File "/opt/conda/lib/python3.8/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
2023-08-30 08:17:29 | ERROR | stderr |     raise mapped_exc(message) from exc
2023-08-30 08:17:29 | ERROR | stderr | httpx.ReadError

same error testing with latest git pull and autogen

lm-sys / FastChat

Could not call v1/chat/completion successfully in new langchain endpoint in openai-compatible server #2066

Import necessary packages

Your other routes and dependencies go here

Import necessary packages

Your other routes and dependencies go here

Try langchain locally, it works well:

Put this langchain example in langchain/basic endpoint. The server got stuck and same error mentioned in my original post occured:

Replace OPENAI_API_BASE with actual openai website and with api key but keep everything else same, and it also works well: