Open lukasugar opened 1 month ago
I don't see anything obvious. What does from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain
look like?
It's pretty basic as well:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI
from app.enrichment.types import IssueList
from app.prompts.issue_aggregator_prompt import ISSUE_AGGREGATOR_SYSTEM_PROMPT
aggregator_text_chain = ChatPromptTemplate.from_messages(
[
(
"system",
ISSUE_AGGREGATOR_SYSTEM_PROMPT,
),
("user", "{text}"),
]
) | ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})
def _serialize_input(x: IssueList) -> str:
"""Helper function to serialize the input"""
if isinstance(x, dict):
_ifl = IssueList(issues=x["issues"])
return _ifl.json()
return x.json()
aggregator_review_chain = (
{"text": RunnableLambda(_serialize_input)}
| aggregator_text_chain
| JsonOutputParser(pydantic_object=IssueList)
)
@eyurtsev any ideas on how to debug this?
Is ChatOpenAI closing its connections after calls?
Ill read over the chat open ai implementation on Monday.
You could try deploying chat open ai as the sole runnable and verifying that you can recreate the problem if so that would help isolate the issue so we can rule out user code.
Would you mind including output of
python -m langchain_core.sys_info
I'll try deploying chat open ai as the sole runnable and recreating the problem first thing tomorrow. In the meantime, here's the output of the stuff command you asked:
# python -m langchain_core.sys_info
System Information
------------------
> OS: Linux
> OS Version: #1 SMP Mon Oct 9 16:21:24 UTC 2023
> Python Version: 3.11.9 (main, Jul 23 2024, 07:22:56) [GCC 12.2.0]
Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langsmith: 0.1.83
> langchain_cli: 0.0.25
> langchain_openai: 0.1.14
> langchain_text_splitters: 0.2.2
> langgraph: 0.1.5
> langserve: 0.2.2
#
Additionally, these are the dependecies in the poetry file:
[tool.poetry.dependencies]
python = ">3.11, <3.12"
uvicorn = "^0.23.2"
langserve = "^0.2.2"
python-decouple = "^3.8"
mypy = "^1.10.0"
poetry-dotenv-plugin = "^0.2.0"
python-dotenv = "^1.0.1"
langchain-openai = "^0.1.14"
langchain-core = "^0.2.11"
langgraph = "^0.1.5"
langchain = "^0.2.6"
pydantic = "<2"
aiosqlite = "^0.20.0"
[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"
And this is the poetry.lock
file: poetry.lock
@eyurtsev I've ran 4k requests to ChatOpenAI and I can see the memory leak. Code:
# Chain definition
simple_chat_openai = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a helpfull assistant. Talk with user using pirate language.",
),
("user", "{text}"),
]
) | ChatOpenAI(model="gpt-4o-mini", temperature=0.03, model_kwargs={"seed": 13})
# server.py
add_routes(app, simple_chat_openai, path="/chat_openai")
Here's the RAM usage. The app uses ~200MB when started. The usage jumps to ~400MB, and stays there even after the requests are completed. The red line is the point in time when all the requests are completed.
I've continued running the endpoint and the memory continued leaking until the service broke:
Here's the chat open AI implementation. It's creating httpx.AsyncClient
.
The client has default limits of:
DEFAULT_LIMITS = Limits(max_connections=100, max_keepalive_connections=20)
So there should be a connection pool there.
@lukasugar
1) which endpoint are you hitting on the server? (ainvoke? astream?) 2) Do you have any additional model configuration? e.g., proxy set up? (I'm wondering if there's any configuration coming from env variables)
@lukasugar while we're debugging, you can roll out a quick workaround using: https://www.uvicorn.org/settings/#resource-limits
Side note, I've tried using Anthropic
chain, and got the same issue:
# chain definition
simple_chat_anthropic = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a helpfull assistant. Talk with user using pirate language.",
),
("user", "{text}"),
]
) | ChatAnthropic(model="claude-3-haiku-20240307")
# server.py
from app.enrichment.public_review import simple_chat_anthropic
add_routes(app, simple_chat_anthropic, path="/chat_anthropic")
The memory is constantly growing (ignore the orange line)
So this could be:
An issue with some base chat langchain class?
Possibly, see if you can confirm the env configuration you have. I don't see anything suspicious in the chat model code right now as it looks like it uses a connection pool by default and is only initialized once.
An issue with the way prompt templates are created in the code?
I wonder if if we're seeing something from instantiation of pydantic models. LangChain relies on pydantic v1 namespace and we instantiate models both to create the prompts and also when we output the messages from the chat model.
The other possible source of issues is langserve itself as it does some stuff w/ request objects and it creates pydantic models
To answer your questions @eyurtsev :
I'm calling the LangServe app from a javascript app and from python notebooks.
In js, I'm using fetch
:
const aiResponse = await fetch("www.my/endpoint/chat_openai/invoke", {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(requestData),
});
In python code, I'm using requests
:
def post_invoke_langserve(path: str, payload: str):
headers = {"Content-Type": "application/json", "Accept": "application/json"}
_url = os.path.join(base_url, path)
response = requests.post(_url, headers=headers, data=payload)
return response
ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})
What environment information do you need?
The Dockerfile is the same as in the LangServe documentation:
FROM python:3.11-slim
RUN pip install poetry==1.6.1
RUN poetry config virtualenvs.create false
WORKDIR /code
COPY ./pyproject.toml ./README.md ./poetry.lock* ./
COPY ./package[s] ./packages
RUN poetry install --no-interaction --no-ansi --no-root
COPY ./app ./app
RUN poetry install --no-interaction --no-ansi
ARG PORT
EXPOSE ${PORT:-8080}
CMD exec uvicorn app.server:app --host 0.0.0.0 --port ${PORT:-8080}
Environment variables that I'm specifying:
LANGCHAIN_API_KEY=some_value
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_PROJECT=langserve-staging
LANGCHAIN_TRACING_V2=true
OPENAI_API_KEY=some_value
ANTHROPIC_API_KEY=some_value
@lukasugar cool that's complete.
I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY
) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.
But that's not the case, and I don't think that's where the issue is from.
Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?
# Chain definition
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a helpfull assistant. Talk with user using pirate language.",
),
("user", "{text}"),
]
)
# server.py
add_routes(app, prompt, path="/prompt")
I've tried running ChatOpenAI
as the only runnable in the chain - and the memory is still leaking.
Here's the code:
from langchain_openai import ChatOpenAI
from langserve import add_routes
add_routes(app, ChatOpenAI(model="gpt-4o-mini"), path="/chat_openai_plain")
The chat now contains only ChatOpenAI and the memory is leaking (orange line). After few thousand requests, memory went from 200MB -> 500MB.
@eyurtsev I've ran the experiment with returning only the prompt. The memory is leaking:
I'm using the exact code as in your example.
@lukasugar cool that's complete.
I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g.,
OPENAI_PROXY
) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.But that's not the case, and I don't think that's where the issue is from.
Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?
# Chain definition prompt = ChatPromptTemplate.from_messages( [ ( "system", "You're a helpfull assistant. Talk with user using pirate language.", ), ("user", "{text}"), ] ) # server.py add_routes(app, prompt, path="/prompt")
OK great this rules out anything specific to chat models.
There's one more potential source which is the langsmith-sdk LANGCHAIN_TRACING_V2=false
-- this also makes network connections so it could explain the oserror.
If it's also not that, I'll need a bit of time to dig in since it's either pydantic or glue code in langserve. If it's pydantic, you'll need to force restart the workers as a work-around and hope that it gets resolved when we upgrade to pydantic 2 (tentatively next month).
I'll disable tracing and check if it changes anything
I'm a bit confused... I disabled langsmith
tracing (removed the environment variables).
It seems that there still is a memory leak, but it's less deterministic, it doesn't happen with all calls.
Memory:
Requests:
It looks like:
So, disabling langsmith
tracing helps, but it's not the only reason for memory leaks.
I don't see a great solution:
langsmith
with some other logging service would help, but I like LangSmithAnd there still are some memory leaks...
@eyurtsev do you think the pydantic 2
update will fix the memory leaks? Could you please find someone from the LangSmith team look into the issue as well?
Thanks!
@lukasugar thanks!
do you think the pydantic 2 update will fix the memory leaks?
I don't know since we still need to isolate exactly where it is. It could be that there's some easy to fix bug in core or langsmith or langserve that's not related to pydantic.
Could you please find someone from the LangSmith team look into the issue as well?
Yes of course!
@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits
--limit-max-requests
@lukasugar I haven't been able to reproduce any issues as long as langsmith tracer is either disabled or else configured properly (i.e. not rate limited).
Could you configure a logger and check if you're getting warnings from the langsmith client about getting rate limited?
if you hammer at the server hard enough while being rate limited by langsmith, you could definitely see memory consumption increase as the tracer will hold on to the data in memory temporarily and do a few more retries to avoid losing tracing data.
import logging
import os
import psutil
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langserve import add_routes
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)
logger = logging.getLogger(__name__)
app = FastAPI()
def get_memory_usage():
process = psutil.Process(os.getpid())
mem_info = process.memory_info()
return mem_info.rss / 1024 / 1024 # Convert bytes to MB
prompt = ChatPromptTemplate.from_messages(
[
("system", "You're an assistant by the name of Bob." * 100),
("human", "{input}"),
]
)
@app.get("/memory-usage")
def memory_usage():
memory = get_memory_usage()
return {"memory_usage": memory}
add_routes(app, prompt, path="/prompt")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=7999)
Here's a curl to issue a request:
random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
curl -X 'POST' \
'http://localhost:7999/prompt/invoke' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d "{
\"input\": {
\"text\": \"$random_string\"
},
\"config\": {},
\"kwargs\": {}
}"
And you can monitor the memory usage this way:
watch -n 1 curl -s localhost:7999/memory-usage
My environment:
OS: Linux OS Version: #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 Python Version: 3.11.4 (main, Sep 25 2023, 10:06:23) [GCC 11.4.0]
langchain_core: 0.2.11 langchain: 0.2.6 langchain_community: 0.0.36 langsmith: 0.1.83 langchain_anthropic: 0.1.11 langchain_cli: 0.0.25 langchain_openai: 0.1.14 langchain_text_splitters: 0.2.2 langgraph: 0.1.5 langserve: 0.2.2
I can confirm that I was getting rate limited by langsmith
:
Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch.
HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Usage limit monthly_traces of 50000 exceeded"}\')')
I'll check the memory consumption the way you suggested, probably tomorrow.
@lukasugar OK for me to close the issue for now?
@eyurtsev sorry, I'm overwhelmed with work the last few days... When I test the memory consumption the way you suggested, I'll re-open the ticket if the issue persists.
@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits
--limit-max-requests
I've tried setting limit_max_requests
to make the server restart after the max number of requests has been reached.
Here's the code:
if __name__ == "__main__":
import uvicorn
while True:
uvicorn.run(
app, host="0.0.0.0", port=8000, limit_max_requests=10
)
print(f"Restarting server")
Nothing happens after the server gets 10 (or even 50) requests.
I've tried simplified code, where it's expected that server will terminate after the limit is reached, it still doesn't work:
if __name__ == "__main__":
import uvicorn
uvicorn.run(
app, host="0.0.0.0", port=8000, limit_max_requests=10
)
I can make as many requests as I want, and the service is still running:
Any idea why that's happening?
I can't verify that workers are restarted after limit_max_requests
. Do you know how I could verify that?
LangServe takes dependency on uvicorn (>=0.23.2,<0.24.0)
. That's a year old version... I've tried updating to the latest, uvicorn 0.30
, but I encountered an issue:
poetry add uvicorn@^0.30
...
Because no versions of langchain-cli match >0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28
and langchain-cli (0.0.15) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.16) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.17) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.18) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.19) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.20) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.21) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.22) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.23) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.24) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.26) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.27) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.28) depends on uvicorn (>=0.23.2,<0.24.0)
and langchain-cli (0.0.25) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15) requires uvicorn (>=0.23.2,<0.24.0).
So, because narrative-langserve depends on both uvicorn (^0.30) and langchain-cli (>=0.0.15), version solving failed.
I've tried updating to the latest uvicorn
version, hoping it solves the issue. Is there any reason why langchain-cli
takes dependency on an old version?
Can you try collecting garbarge by calling the garbage collector explicitly (gc.collect() ) after handling requests to free up memory?
I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?
I'm seeing this error:
OSError: [Errno 24] Too many open files socket.accept() out of system resource
seems like some clients are not closing connections. I'm using only
ChatOpenAI
in this app.With every new request, RAM increases and doesn't go down:
The code is straightforward, I'm following examples from the docs. Chain definitio in
public_review.py
:from langchain_core.output_parsers import JsonOutputParser from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from app.prompts.public_review_analysis_prompt import ( PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT, ) public_review_text_chain = ChatPromptTemplate.from_messages( [ ( "system", PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT, ), ("user", "{text}"), ] ) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13}) public_review_chain = ( | public_review_text_chain | JsonOutputParser(pydantic_object=IssueList) )
Hi, just wanted to confirm if this chain works as intended? I see you're using JsonOutputParser(pydantic_object=IssueList)
however in your implementation, you're not using .with_structured_output()
or bind_tools()
to enforce this.
If the goal is to get json output you should specify the json_mode in .bind_tools
or .with_structured_output
Hello everyone, Any suggestions or solutions? I'm having the same problem... ... After running it 1500 times, my memory usage has remained on the peak.
@michael81045 can you provide more context so we can see how our projects overlap, and precisely identify the issue?
What's your system info, what environment are you using? Are you using LangSmith logging?
Same issue on our side :O
@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏
@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏
Hi apologies I was on vacation and then working on the 0.3 release for langchain. I'll check what's constraining uvicorn (probably sse-starlette) and unpin.
@michael81045 , @pedrojrv , @lukasugar I still haven't seen a confirmation of what's actually causing the memory leak. Based on what I diagnosed above it was happening because of user misconfiguration of langsmith. (i.e., enabling the tracer, not sampling of traces etc). For folks seeing problems, can you confirm that it's not from a misconfiguration of langsmith?
langserve does not pin uvicorn directly, and based on sub-deps I don't see any uvicorn version pinning (e.g., from sse-starlette).
sse-starlette==1.8.2 ├── anyio [required: Any, installed: 4.4.0] │ ├── idna [required: >=2.8, installed: 3.8] │ └── sniffio [required: >=1.1, installed: 1.3.1] ├── fastapi [required: Any, installed: 0.114.1] │ ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.9.1] │ │ ├── annotated-types [required: >=0.6.0, installed: 0.7.0] │ │ ├── pydantic_core [required: ==2.23.3, installed: 2.23.3] │ │ │ └── typing_extensions [required: >=4.6.0,!=4.7.0, installed: 4.12.2] │ │ └── typing_extensions [required: >=4.6.1, installed: 4.12.2] │ ├── starlette [required: >=0.37.2,<0.39.0, installed: 0.38.5] │ │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0] │ │ ├── idna [required: >=2.8, installed: 3.8] │ │ └── sniffio [required: >=1.1, installed: 1.3.1] │ └── typing_extensions [required: >=4.8.0, installed: 4.12.2] ├── starlette [required: Any, installed: 0.38.5] │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0] │ ├── idna [required: >=2.8, installed: 3.8] │ └── sniffio [required: >=1.1, installed: 1.3.1] └── uvicorn [required: Any, installed: 0.23.2] ├── click [required: >=7.0, installed: 8.1.7] └── h11 [required: >=0.8, installed: 0.14.0]
I suggest using pipdeptree to determine what's pinning the uvicorn
version
I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?
I'm seeing this error:
seems like some clients are not closing connections. I'm using only
ChatOpenAI
in this app.With every new request, RAM increases and doesn't go down:
The code is straightforward, I'm following examples from the docs. Chain definitio in
public_review.py
:Chain is imported in
routers.py
:Any ideas what could be causing the leak? This is literally the entire code.