langchain-ai / langserve

LangServe 🦜️🏓
Other
1.87k stars 207 forks source link

Memory leak in LangServe #717

Open lukasugar opened 1 month ago

lukasugar commented 1 month ago

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:

image

The code is straightforward, I'm following examples from the docs. Chain definitio in public_review.py:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})

public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Chain is imported in routers.py:

# Chain added to router and router is then added to the app
from fastapi import APIRouter
from langserve import add_routes

from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain
from app.enrichment.public_review import public_review_chain, public_review_text_chain
from app.enrichment.types import (
    InputFragment,
    InputFragmentList
  )

router = APIRouter()

add_routes(
    router,
    public_review_chain.with_types(input_type=InputFragmentList, output_type=IssueList),
    path="/api/v1/public_review",
)
add_routes(router, public_review_text_chain, path="/api/v1/public_review/text")

Any ideas what could be causing the leak? This is literally the entire code.

eyurtsev commented 1 month ago

I don't see anything obvious. What does from app.enrichment.aggregator import aggregator_review_chain, aggregator_text_chain look like?

lukasugar commented 1 month ago

It's pretty basic as well:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

from app.enrichment.types import IssueList
from app.prompts.issue_aggregator_prompt import ISSUE_AGGREGATOR_SYSTEM_PROMPT

aggregator_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            ISSUE_AGGREGATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})

def _serialize_input(x: IssueList) -> str:
    """Helper function to serialize the input"""

    if isinstance(x, dict):
        _ifl = IssueList(issues=x["issues"])
        return _ifl.json()
    return x.json()

aggregator_review_chain = (
    {"text": RunnableLambda(_serialize_input)}
    | aggregator_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)
lukasugar commented 1 month ago

@eyurtsev any ideas on how to debug this?

Is ChatOpenAI closing its connections after calls?

eyurtsev commented 1 month ago

Ill read over the chat open ai implementation on Monday.

You could try deploying chat open ai as the sole runnable and verifying that you can recreate the problem if so that would help isolate the issue so we can rule out user code.

eyurtsev commented 1 month ago

Would you mind including output of

python -m langchain_core.sys_info

lukasugar commented 1 month ago

I'll try deploying chat open ai as the sole runnable and recreating the problem first thing tomorrow. In the meantime, here's the output of the stuff command you asked:

# python -m langchain_core.sys_info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Mon Oct 9 16:21:24 UTC 2023
> Python Version:  3.11.9 (main, Jul 23 2024, 07:22:56) [GCC 12.2.0]

Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langsmith: 0.1.83
> langchain_cli: 0.0.25
> langchain_openai: 0.1.14
> langchain_text_splitters: 0.2.2
> langgraph: 0.1.5
> langserve: 0.2.2
# 

Additionally, these are the dependecies in the poetry file:

[tool.poetry.dependencies]
python = ">3.11, <3.12"
uvicorn = "^0.23.2"
langserve = "^0.2.2"
python-decouple = "^3.8"
mypy = "^1.10.0"
poetry-dotenv-plugin = "^0.2.0"
python-dotenv = "^1.0.1"
langchain-openai = "^0.1.14"
langchain-core = "^0.2.11"
langgraph = "^0.1.5"
langchain = "^0.2.6"
pydantic = "<2"
aiosqlite = "^0.20.0"

[tool.poetry.group.dev.dependencies]
langchain-cli = ">=0.0.15"

And this is the poetry.lock file: poetry.lock

lukasugar commented 1 month ago

@eyurtsev I've ran 4k requests to ChatOpenAI and I can see the memory leak. Code:

# Chain definition
simple_chat_openai = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o-mini", temperature=0.03, model_kwargs={"seed": 13})

# server.py
add_routes(app, simple_chat_openai, path="/chat_openai")

Here's the RAM usage. The app uses ~200MB when started. The usage jumps to ~400MB, and stays there even after the requests are completed. The red line is the point in time when all the requests are completed.

image
lukasugar commented 1 month ago

I've continued running the endpoint and the memory continued leaking until the service broke:

image
eyurtsev commented 1 month ago

Here's the chat open AI implementation. It's creating httpx.AsyncClient.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L451-L451

The client has default limits of:

DEFAULT_LIMITS = Limits(max_connections=100, max_keepalive_connections=20)

So there should be a connection pool there.


@lukasugar

1) which endpoint are you hitting on the server? (ainvoke? astream?) 2) Do you have any additional model configuration? e.g., proxy set up? (I'm wondering if there's any configuration coming from env variables)

eyurtsev commented 1 month ago

@lukasugar while we're debugging, you can roll out a quick workaround using: https://www.uvicorn.org/settings/#resource-limits

lukasugar commented 1 month ago

Side note, I've tried using Anthropic chain, and got the same issue:

# chain definition
simple_chat_anthropic = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
) | ChatAnthropic(model="claude-3-haiku-20240307")

# server.py
from app.enrichment.public_review import simple_chat_anthropic

add_routes(app, simple_chat_anthropic, path="/chat_anthropic")

The memory is constantly growing (ignore the orange line)

image

So this could be:

  1. An issue with some base chat langchain class?
  2. An issue with the way prompt templates are created in the code?
eyurtsev commented 1 month ago

An issue with some base chat langchain class?

Possibly, see if you can confirm the env configuration you have. I don't see anything suspicious in the chat model code right now as it looks like it uses a connection pool by default and is only initialized once.


An issue with the way prompt templates are created in the code?

I wonder if if we're seeing something from instantiation of pydantic models. LangChain relies on pydantic v1 namespace and we instantiate models both to create the prompts and also when we output the messages from the chat model.


The other possible source of issues is langserve itself as it does some stuff w/ request objects and it creates pydantic models

lukasugar commented 1 month ago

To answer your questions @eyurtsev :

  1. I'm invoking the server through: image

    I'm calling the LangServe app from a javascript app and from python notebooks. In js, I'm using fetch:

      const aiResponse = await fetch("www.my/endpoint/chat_openai/invoke", {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(requestData),
      });

In python code, I'm using requests:

def post_invoke_langserve(path: str, payload: str):
    headers = {"Content-Type": "application/json", "Accept": "application/json"}
    _url = os.path.join(base_url, path)
    response = requests.post(_url, headers=headers, data=payload)

    return response
  1. We don't have any additional model configs. Some models have specified seed and temperature, that's all:
    ChatOpenAI(model="gpt-4-turbo", temperature=0.03, model_kwargs={"seed": 13})
lukasugar commented 1 month ago

What environment information do you need?

The Dockerfile is the same as in the LangServe documentation:

FROM python:3.11-slim

RUN pip install poetry==1.6.1

RUN poetry config virtualenvs.create false

WORKDIR /code

COPY ./pyproject.toml ./README.md ./poetry.lock* ./

COPY ./package[s] ./packages

RUN poetry install  --no-interaction --no-ansi --no-root

COPY ./app ./app

RUN poetry install --no-interaction --no-ansi

ARG PORT

EXPOSE ${PORT:-8080}

CMD exec uvicorn app.server:app --host 0.0.0.0 --port ${PORT:-8080}

Environment variables that I'm specifying:

LANGCHAIN_API_KEY=some_value
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_PROJECT=langserve-staging
LANGCHAIN_TRACING_V2=true
OPENAI_API_KEY=some_value
ANTHROPIC_API_KEY=some_value
eyurtsev commented 1 month ago

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.


Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?

# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")
lukasugar commented 1 month ago

I've tried running ChatOpenAI as the only runnable in the chain - and the memory is still leaking.

Here's the code:

from langchain_openai import ChatOpenAI
from langserve import add_routes

add_routes(app, ChatOpenAI(model="gpt-4o-mini"), path="/chat_openai_plain")

The chat now contains only ChatOpenAI and the memory is leaking (orange line). After few thousand requests, memory went from 200MB -> 500MB.

image
lukasugar commented 1 month ago

@eyurtsev I've ran the experiment with returning only the prompt. The memory is leaking:

image

I'm using the exact code as in your example.

@lukasugar cool that's complete.

I was looking for any env information that could possibly change the instantiation of the httpx client used in ChatOpenAI (e.g., OPENAI_PROXY) to see if by any chance gets rid of the max limit on the number of connections in the connection pool.

https://github.com/langchain-ai/langchain/blob/b3a23ddf9378a2616e35077b6d82d8fd1ef60cbc/libs/partners/openai/langchain_openai/chat_models/base.py#L410-L410

But that's not the case, and I don't think that's where the issue is from.

Are you able to isolate further and determine whether just deploying the prompt re-produces the issue?

# Chain definition
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a helpfull assistant. Talk with user using pirate language.",
        ),
        ("user", "{text}"),
    ]
)

# server.py
add_routes(app, prompt, path="/prompt")
eyurtsev commented 1 month ago

OK great this rules out anything specific to chat models.

There's one more potential source which is the langsmith-sdk LANGCHAIN_TRACING_V2=false -- this also makes network connections so it could explain the oserror.

If it's also not that, I'll need a bit of time to dig in since it's either pydantic or glue code in langserve. If it's pydantic, you'll need to force restart the workers as a work-around and hope that it gets resolved when we upgrade to pydantic 2 (tentatively next month).

lukasugar commented 1 month ago

I'll disable tracing and check if it changes anything

lukasugar commented 1 month ago

I'm a bit confused... I disabled langsmith tracing (removed the environment variables). It seems that there still is a memory leak, but it's less deterministic, it doesn't happen with all calls.

Memory:

image

Requests:

image

It looks like:

So, disabling langsmith tracing helps, but it's not the only reason for memory leaks.

I don't see a great solution:

And there still are some memory leaks...

lukasugar commented 1 month ago

@eyurtsev do you think the pydantic 2 update will fix the memory leaks? Could you please find someone from the LangSmith team look into the issue as well? Thanks!

eyurtsev commented 1 month ago

@lukasugar thanks!

do you think the pydantic 2 update will fix the memory leaks?

I don't know since we still need to isolate exactly where it is. It could be that there's some easy to fix bug in core or langsmith or langserve that's not related to pydantic.

Could you please find someone from the LangSmith team look into the issue as well?

Yes of course!

eyurtsev commented 1 month ago

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

eyurtsev commented 1 month ago

@lukasugar I haven't been able to reproduce any issues as long as langsmith tracer is either disabled or else configured properly (i.e. not rate limited).

Could you configure a logger and check if you're getting warnings from the langsmith client about getting rate limited?

if you hammer at the server hard enough while being rate limited by langsmith, you could definitely see memory consumption increase as the tracer will hold on to the data in memory temporarily and do a few more retries to avoid losing tracing data.


import logging
import os

import psutil
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate

from langserve import add_routes

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

logger = logging.getLogger(__name__)

app = FastAPI()

def get_memory_usage():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss / 1024 / 1024  # Convert bytes to MB

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You're an assistant by the name of Bob." * 100),
        ("human", "{input}"),
    ]
)

@app.get("/memory-usage")
def memory_usage():
    memory = get_memory_usage()
    return {"memory_usage": memory}

add_routes(app, prompt, path="/prompt")

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=7999)

Here's a curl to issue a request:

random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
curl -X 'POST' \
  'http://localhost:7999/prompt/invoke' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{
  \"input\": {
    \"text\": \"$random_string\"
  },
  \"config\": {},
  \"kwargs\": {}
}"

And you can monitor the memory usage this way:

watch -n 1 curl -s localhost:7999/memory-usage

My environment:

System Information

OS: Linux OS Version: #44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2 Python Version: 3.11.4 (main, Sep 25 2023, 10:06:23) [GCC 11.4.0]

Package Information

langchain_core: 0.2.11 langchain: 0.2.6 langchain_community: 0.0.36 langsmith: 0.1.83 langchain_anthropic: 0.1.11 langchain_cli: 0.0.25 langchain_openai: 0.1.14 langchain_text_splitters: 0.2.2 langgraph: 0.1.5 langserve: 0.2.2

lukasugar commented 1 month ago

I can confirm that I was getting rate limited by langsmith:

Failed to batch ingest runs: LangSmithRateLimitError('Rate limit exceeded for https://api.smith.langchain.com/runs/batch.
 HTTPError(\'429 Client Error: Too Many Requests for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"Usage limit monthly_traces of 50000 exceeded"}\')')

I'll check the memory consumption the way you suggested, probably tomorrow.

eyurtsev commented 1 month ago

@lukasugar OK for me to close the issue for now?

lukasugar commented 1 month ago

@eyurtsev sorry, I'm overwhelmed with work the last few days... When I test the memory consumption the way you suggested, I'll re-open the ticket if the issue persists.

lukasugar commented 1 month ago

@lukasugar while we're investigating you should be able to use this work-around: https://www.uvicorn.org/settings/#resource-limits

--limit-max-requests

I've tried setting limit_max_requests to make the server restart after the max number of requests has been reached.

Here's the code:

if __name__ == "__main__":
    import uvicorn

    while True:
        uvicorn.run(
            app, host="0.0.0.0", port=8000, limit_max_requests=10
        )

        print(f"Restarting server")

Nothing happens after the server gets 10 (or even 50) requests.

I've tried simplified code, where it's expected that server will terminate after the limit is reached, it still doesn't work:

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(
        app, host="0.0.0.0", port=8000, limit_max_requests=10
    )

I can make as many requests as I want, and the service is still running:

image

Any idea why that's happening?

lukasugar commented 1 month ago

I can't verify that workers are restarted after limit_max_requests. Do you know how I could verify that?

lukasugar commented 1 month ago

LangServe takes dependency on uvicorn (>=0.23.2,<0.24.0). That's a year old version... I've tried updating to the latest, uvicorn 0.30, but I encountered an issue:

poetry add uvicorn@^0.30

...

Because no versions of langchain-cli match >0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28
 and langchain-cli (0.0.15) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.16 || >0.0.16,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.16) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.17 || >0.0.17,<0.0.18 || >0.0.18,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.17) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.18) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.19 || >0.0.19,<0.0.20 || >0.0.20,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.19) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.20) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.21 || >0.0.21,<0.0.22 || >0.0.22,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.21) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.22) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.23 || >0.0.23,<0.0.24 || >0.0.24,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.23) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.24) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.26 || >0.0.26,<0.0.27 || >0.0.27,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.26) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.27) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15,<0.0.25 || >0.0.25,<0.0.28 || >0.0.28) requires uvicorn (>=0.23.2,<0.24.0).
And because langchain-cli (0.0.28) depends on uvicorn (>=0.23.2,<0.24.0)
 and langchain-cli (0.0.25) depends on uvicorn (>=0.23.2,<0.24.0), langchain-cli (>=0.0.15) requires uvicorn (>=0.23.2,<0.24.0).
So, because narrative-langserve depends on both uvicorn (^0.30) and langchain-cli (>=0.0.15), version solving failed.

I've tried updating to the latest uvicorn version, hoping it solves the issue. Is there any reason why langchain-cli takes dependency on an old version?

Omega-Centauri-21 commented 1 month ago

Can you try collecting garbarge by calling the garbage collector explicitly (gc.collect() ) after handling requests to free up memory?

siddicky commented 3 weeks ago

I'm hosting a langserve app. The app is quite simple, but there seems to be a memory leak. Any ideas on why this is happening?

I'm seeing this error:

OSError: [Errno 24] Too many open files
socket.accept() out of system resource

seems like some clients are not closing connections. I'm using only ChatOpenAI in this app.

With every new request, RAM increases and doesn't go down:

image

The code is straightforward, I'm following examples from the docs. Chain definitio in public_review.py:

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from app.prompts.public_review_analysis_prompt import (
    PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
)

public_review_text_chain = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            PUBLIC_REVIEW_ISSUE_GENERATOR_SYSTEM_PROMPT,
        ),
        ("user", "{text}"),
    ]
) | ChatOpenAI(model="gpt-4o", temperature=0.03, model_kwargs={"seed": 13})

public_review_chain = (
    | public_review_text_chain
    | JsonOutputParser(pydantic_object=IssueList)
)

Hi, just wanted to confirm if this chain works as intended? I see you're using JsonOutputParser(pydantic_object=IssueList) however in your implementation, you're not using .with_structured_output() or bind_tools() to enforce this.

If the goal is to get json output you should specify the json_mode in .bind_tools or .with_structured_output

michael81045 commented 3 weeks ago

Hello everyone, Any suggestions or solutions? I'm having the same problem... ... After running it 1500 times, my memory usage has remained on the peak.

lukasugar commented 3 weeks ago

@michael81045 can you provide more context so we can see how our projects overlap, and precisely identify the issue?

What's your system info, what environment are you using? Are you using LangSmith logging?

pedrojrv commented 1 week ago

Same issue on our side :O

lukasugar commented 1 week ago

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

lukasugar commented 1 week ago

@eyurtsev this seems to be an issue that a lot of folks are facing... Any new ideas for the fix? 🙏

eyurtsev commented 3 hours ago

Hi apologies I was on vacation and then working on the 0.3 release for langchain. I'll check what's constraining uvicorn (probably sse-starlette) and unpin.

@michael81045 , @pedrojrv , @lukasugar I still haven't seen a confirmation of what's actually causing the memory leak. Based on what I diagnosed above it was happening because of user misconfiguration of langsmith. (i.e., enabling the tracer, not sampling of traces etc). For folks seeing problems, can you confirm that it's not from a misconfiguration of langsmith?

eyurtsev commented 1 hour ago

langserve does not pin uvicorn directly, and based on sub-deps I don't see any uvicorn version pinning (e.g., from sse-starlette).

sse-starlette==1.8.2 ├── anyio [required: Any, installed: 4.4.0] │ ├── idna [required: >=2.8, installed: 3.8] │ └── sniffio [required: >=1.1, installed: 1.3.1] ├── fastapi [required: Any, installed: 0.114.1] │ ├── pydantic [required: >=1.7.4,<3.0.0,!=2.1.0,!=2.0.1,!=2.0.0,!=1.8.1,!=1.8, installed: 2.9.1] │ │ ├── annotated-types [required: >=0.6.0, installed: 0.7.0] │ │ ├── pydantic_core [required: ==2.23.3, installed: 2.23.3] │ │ │ └── typing_extensions [required: >=4.6.0,!=4.7.0, installed: 4.12.2] │ │ └── typing_extensions [required: >=4.6.1, installed: 4.12.2] │ ├── starlette [required: >=0.37.2,<0.39.0, installed: 0.38.5] │ │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0] │ │ ├── idna [required: >=2.8, installed: 3.8] │ │ └── sniffio [required: >=1.1, installed: 1.3.1] │ └── typing_extensions [required: >=4.8.0, installed: 4.12.2] ├── starlette [required: Any, installed: 0.38.5] │ └── anyio [required: >=3.4.0,<5, installed: 4.4.0] │ ├── idna [required: >=2.8, installed: 3.8] │ └── sniffio [required: >=1.1, installed: 1.3.1] └── uvicorn [required: Any, installed: 0.23.2] ├── click [required: >=7.0, installed: 8.1.7] └── h11 [required: >=0.8, installed: 0.14.0]


I suggest using pipdeptree to determine what's pinning the uvicorn version