Memory leak in the chat completion `create` method

CodePalAI commented 9 months ago

Confirm this is an issue with the Python library and not an underlying OpenAI API

[X] This is an issue with the Python library

Describe the bug

Calling the create method on completions introduces a memory leak, according to tracemalloc.

Example:

client = OpenAI(MY_KEY)
client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Can you write a poem?",
        }
    ],
    model="gpt-3.5-turbo"
)

How to determine it's a memory leak?

I use tracemalloc with my flask application:

@blueprint.route("/admin/sys/stats")
def admin_sys_stats():
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')

    from openai import OpenAI

    client = OpenAI(KEY)
    client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Can you write a poem?",
            }
        ],
        model="gpt-3.5-turbo"
    )

    stats = ""
    for stat in top_stats[:1000]:
        if grep in str(stat):
            stats += str(stat) + "\n"

    return f"<pre>{stats}</pre>", 200

When running this endpoint multiple times, one line is at the very top (which means it's the most expensive one):

\venv\Lib\site-packages\openai\_response.py:227: size=103 KiB, count=1050, average=100 B

When I refresh, the size increases. Of course, in a production environment, the numbers get high a lot quicker.

To Reproduce

There's no one way to prove there's a memory leak. But what I did was:

Setup a flask application
Create the route provided in the bug description
Hit the route multiple times, you'll see an increase in the size of the object

Code snippets

No response

OS

Linux, macOS

Python version

Python v3.11.2

Library version

openai v1.2.4

RobertCraigie commented 9 months ago

You shouldn't be instantiating a new client for every request, if you move the client outside of the request handler you shouldn't see any memory leaks.

I did manage to replicate the memory leak you're seeing with your example and moving the client instantiation outside of the function resulted in stable memory usage.

qxcv commented 9 months ago

For what it's worth, I still see this issue when wrapping @CodePalAI's OpenAI client in a with block, which suggests there is some deeper problem here. I've been trying to track down a leak in my app (which existed with the pre-1.0 API, and has gotten much more aggressive after upgrading to 1.3) and I also ran into this issue.

Filimoa commented 9 months ago

Wouldn't "moving the client outside of the request handler" be the same thing as a module level client?

From the README, I figured the network resources are getting cleaned up when the OpenAI client gets garbage collected. Sounds like this isn't necessarily true?

This might be outside of the scope of this library but I'm also trying to understand how this fits in your typical backend app.

It seems like using a module scoped client is heavily discouraged but so is instantiating a new client with every request (ie a FastAPI dependency). What is the other option here?

brian-goo commented 9 months ago

@RobertCraigie

How can we implement an async client with fastapi correctly?

Is it safe to implement like this (singleton approach) ? https://github.com/tiangolo/fastapi/discussions/8301#discussioncomment-5150759

Another problem is that we have multiple api keys like below. How can we handle this nicely if we have to instantiate a client once as mentioned in this issue https://github.com/openai/openai-python/issues/874?

    client = AsyncAzureOpenAI(
        api_key=api_keys_list[random_index],
        ...

RobertCraigie commented 9 months ago

@brian-goo you can create a single client and use FastAPI app events to close it when the server is stopped and then use client.with_options(api_key='foo') to use different API keys for different requests.

RobertCraigie commented 9 months ago

@Filimoa what are you trying to achieve that requires you to instantiate a new client for every request?

If you really want to do so you currently need to use client.with_options() or create a custom http client and pass that to the constructor every time to avoid resource leaks.

Instantiating a new http client for each request is very bad for performance regardless of whether or not it has any resource leaks as the client can not reuse connections.

abi commented 9 months ago

@RobertCraigie one use case I have is with https://github.com/abi/screenshot-to-code where users put in their Open AI key on the hosted version (that's how the app is free to use). I understand that performance will be worse with instantiating a new client on each request but a memory leak is even worse.

Here's a fun memory leak graph from my server :) Not 100% sure it's because of this issue but very likely.

RobertCraigie commented 9 months ago

@abi are you explicitly closing the client using client.close()? If you are then I would consider that a bug.

Additionally, is there a reason that client.with_options(api_key=user_api_key) doesn't work for you? That should give you all of the benefit of allowing users to specify their own API keys without creating a ton of unnecessary connections.

abi commented 9 months ago

@RobertCraigie I'm not closing with client.close(): https://github.com/abi/screenshot-to-code/blob/main/backend/llm.py#L26 Will try that now and report back. Thanks for the help.

The main reason client.with_options(api_key=user_api_key) is non-ideal even though it would work is that it's a bit harder to debug and error-prone when you're not starting from a clean, empty state. Just now, I even tried doing an empty AsyncOpenAI() on the module level, and setting api_key using client.with_options(api_key=user_api_key). For some reason, even before client.with_options(api_key=user_api_key), client.api_key had a value and then, I realized it was pulling an initial value from my .env. Much more straightforward to do AsyncOpenAI(api_key) and know for sure that which api_key the client is associated with. It's a bit scary to potentially serve 1 user's request with another user's api key, even if that can only happen due to a bug in my code. The ergonomics of AsyncOpenAI(api_key=) are just nicer but not a big deal if .close() fixes my memory leak.

antont commented 9 months ago

@abi AsyncOpenAI() on the module level, and setting api_key using client.with_options(api_key=user_api_key). For some reason, even before client.with_options(api_key=user_api_key), client.api_key had a value and then, I realized it was pulling an initial value from my .env. Much more straightforward to do AsyncOpenAI(api_key) and know for sure that which api_key the client is associated with. It's a bit scary to potentially serve 1 user's request with another user's api key, even if that can only happen due to a bug in my code.

Interesting point. Maybe there could be an option to disable the use of of that default option (what about others?), so that the client would err if you forget to provide the key using with_options in some place. Perhaps a bit awkward but would work to safeguard against such mistakes quite ok, no?

RobertCraigie commented 9 months ago

It's a bit scary to potentially serve 1 user's request with another user's api key, even if that can only happen due to a bug in my code

This is a fair point, I do think there are ways you could structure your code to make this practically impossible to happen though. Even for anyone making changes to your codebase for the first time.

For example, instead of using .with_options() yourself, you could define a helper function that you use whenever you need to make an OpenAI request, e.g.

# utils/openai_client.py
from openai import AsyncOpenAI

_client = AsyncOpenAI(
  # provide a dummy API key so that requests made directly will always fail
  api_key='<this client should never be used directly!>',
)

def get_openai(user: User) -> AsyncOpenAI:
  return _client.with_options(api_key=user.openai_api_key)

# backend.py

async def handler():
  user = get_user()

  completion = await get_openai(user).chat.completions.create(...)
  # ...

It's also worth noting that you could use your existing pattern and avoid memory leaks entirely by sharing the http client instance if you really don't want to use with_options, the only risk here is that you could instantiate a client and forget to pass in your http client which then gives you memory leaks again.

import httpx
from openai import AsyncOpenAI

http_client = httpx.AsyncClient()

async def handler(api_key):
  openai_client = AsyncOpenAI(api_key=api_key, http_client=http_client)

I would argue though that the first snippet in this comment is less likely to result in accidentally using the incorrect API key because you could accidentally just write AsyncOpenAI() and then it will use your API key from your environment.

However with a helper function similar to get_openai() you're encoding the setup of your system directly into it so instead of having to know to pass api_key=user_api_key you're forced to by your own function definition.

The only way you'll accidentally use the wrong API key is by forgetting to use this helper function, which it may be possible to define lint rules for (I don't know of an easy way to do this off the top of my head).

abi commented 9 months ago

@RobertCraigie that's super helpful, both of those patterns. Didn't realize you could pass in the http_client for the client.

I pushed client.close() a couple of hours ago BTW, and that solved my memory leak issues. Thanks for your help on this!

gintsmurans commented 9 months ago

Hi!

Started noticing a weird behaviour in our api server and after digging a bit deeper I think I found the underlaying issue with memory leaks. The way classes uses references to self won't allow for resources to be freed as classes have objects that in turn have references to the class it self.

After testing a bit with forced close functions that frees resources, it runs deinit and frees memory as it should.

rattrayalex commented 9 months ago

Thanks for the investigation there, @gintsmurans ! That makes sense to me. We'll take a look soon and see if we can demonstrate that this prevents the issue.

tamird commented 9 months ago

@gintsmurans can you help me understand exactly what you believe to be happening? I agree that Completions.with_raw_response holds a reference to the Completions object, which forms a circular reference, but I don't understand why that would cause a memory leak.

gintsmurans commented 9 months ago

Hey @tamird !

Created a little example here: https://gist.github.com/gintsmurans/79be8ee951ae1658ef9d7afd2f7d5460 There are comments at the top of the file of what I did to test it out.

antont commented 9 months ago

@gintsmurans Why do you call the init many times? Maybe it's good to improve your case, but anyhow, am curious as usually AFAIK there is no reason to.

gintsmurans commented 9 months ago

In our case its an api backend server where openai is just a small part of it, so we don't keep one global openai instance. But as requests comes in, in a week or so, api server's memory is all used up. In the example I am just trying to show how the leak is happening.

antont commented 9 months ago

In our case its an api backend server where openai is just a small part of it, so we don't keep one global openai instance.

Is there a technical reason for this? Our backend also uses many APIs and OpenAI is just a small part of it, but still we use and keep a client per process.

gintsmurans commented 9 months ago

There wasn't a definite reason, just that we made it that way, but now when I think about it - we are using async request handlers. If we would be using the same instance for all openai requests, that could lead to conflicts and even deadlocks. Learned it the hard way. :)

Anyhow, any library should cleanup after it self, or give option to do it - you never know how people are gonna use it.

antont commented 9 months ago

There wasn't a definite reason, just that we made it that way, but now when I think about it - we are using async request handlers. If we would be using the same instance for all openai requests, that could lead to conflicts and even deadlocks. Learned it the hard way. :)

What do you mean?

We also use async request handlers, and the async support in the OpenAI client, to scale. I'm not aware of any problems it would give, conflicts or deadlocks or anything. I think those would be a serious bug in the library. I guess you encountered something, as you learned it the hard way?

Anyhow, any library should cleanup after it self, or give option to do it - you never know how people are gonna use it.

Sure, but I think it's also good to know the good patterns for usage, and possible problems.

gintsmurans commented 9 months ago

As far as my understanding goes .. lets say one api request starts async task, but as it waits for results, another request can be handled. At the same time the class instance that handles requests is the same one for every request.

We had serveral issues with this:

We had some properties we set on the handler class for each request, like auth token, gathered from request headers. Now first request sets this property and calls async method that takes some time to answer - like openai api request. While that is executing, another request comes in and sets class property to a new auth token. Now if first requests, after the prolonged task is finished, would use that token in any way, it will have second request's token, which is not perfect. :)
On each request we were taking a postgres and redis connection from a pool and were using those in subsequent method calls via class properties. At some point first request starts a database transaction, but another request changes connection instance to a new connection from a pool. Now first request handler will continue with this new instance, but old instance just hangs somewhere in memory with a started transaction, maybe it is cleaned up with python's gc, maybe its not, but transaction stays open on the database server. This one lead to deadlocks as two async tasks whats to use same instance, postgres transaction issues, etc. Global instances would not solve this, but would make it even worse. For example, one request starts a transaction, then does something prolonged, while another request comes in and tries to start a new transaction on the same connection object.

We solved these by using ContextVar class to make sure, only one context - request uses same object instance.

In case of openai - not sure how it would react to multiple simultaneous requests. I would not use global instances for anything that I can be sure its thread / context safe. Which leads to openai not releasing memory correctly. :)

antont commented 9 months ago

@gintsmurans Yes, the idea of async request handlers is that you can have many running concurrently, within a single process and thread. And indeed, you should not use class attributes or other mechanisms to mix up request specific data then. I've usually just passed the request context as parameters to the business logic functions, like the session object when using postgres via sqlalchemy.

Anyway, that's not related to this OpenAI client lib, which AFAIK does no such mixups, and where reusing a single client is the recommended way for multiple concurrent async request handlers.

Again, sure, it's good to fix the cleanup, but it's also good to be clear about what is good practice and why. I've understood that sharing the client cross multiple handlers is good so that you are not opening and closing connections to OpenAI in vain all the time.

gintsmurans commented 9 months ago

Haven't looked, but I would guess that open ai does not create a "new connection" with each instance, but internally does basic http requests when asked to - for example on completions.create.

In my subjective opinion, using same instances between coroutines / threads is just a bad practice, will lead to weird behavior sooner or later, thus should be avoided - you never know what a third party library does internally, nor you can check each one of them.

gintsmurans commented 9 months ago

Anyway, it really does not matter as its not in the scope of this issue. Focus is that there seems to be memory leaks, regardless of how the library is used.

antont commented 9 months ago

Haven't looked, but I would guess that open ai does not create a "new connection" with each instance, but internally does basic http requests when asked to - for example on completions.create.

I think the http connections are pooled, by the underlying httpx library. https://www.python-httpx.org/advanced/ says:

On the other hand, a Client instance uses HTTP connection pooling. This means that when you make several requests to the same host, the Client will reuse the underlying TCP connection, instead of recreating one for every single request.

On the OpenAI side, the httpx client is kept and reused by requests, so I think it gets to use the pool, and does not create new connections for new requests. https://github.com/openai/openai-python/blob/main/src/openai/_base_client.py#L872

In my subjective opinion, using same instances between coroutines / threads is just a bad practice, will lead to weird behavior sooner or later, thus should be avoided - you never know what a third party library does internally, nor you can check each one of them.

I think you are wrong here, and both the OpenAI and HTTPX libraries are made so that it can be a good idea to reuse connections across async tasks. With threads it would indeed lead to problems I think, but the point of async is to not need threads for this kind of things.

rattrayalex commented 8 months ago

We believe this has been fixed in 1.3.9! Please open a new issue if you see further problems.

pseudotensor commented 6 months ago

@RobertCraigie

https://github.com/openai/openai-python/issues/820#issuecomment-1837590035

These are all nice things, but we've had issues with the connection becoming corrupt after some kinda of failures. This is something a fresh connection never does, so we'd prefer if a fresh connection did not have memory leaks. It seems risky to rely upon the same http client to be perfect forever globally.

rattrayalex commented 6 months ago

@pseudotensor as I said just above, please open a new issue if you see further problems instead of commenting here. It's not clear to me whether #1181 describes the entirety of the issues you're seeing.

pseudotensor commented 6 months ago

Yes, you can see I opened a new issue, what's your question here?

FYI I'm just responding to one particular comment here, so others know as well when they search for issues.

openai / openai-python