Connection reset from long-running or stale API connections

mathcass commented 1 year ago

Describe the bug

As we've used the openai.ChatCompletion.create (with gpt-3.5-turbo), we've had intermittent

requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

without a clear reproduction. At first I thought it was https://github.com/openai/openai-python/issues/91 and due to too many open connections to the OpenAI servers. Now I think it looks more like https://github.com/openai/openai-python/issues/368 instead, but I have some hypotheses about it. I'm opening a new issue separate from https://github.com/openai/openai-python/issues/368 in case they're different. If this is a duplicate, we can feel free to tack on my details there.

My hypothesis is that if you have a long running process (like a web server), and it calls out to OpenAI, that periods of inactivity cause the server side to terminate the connection and it takes a long time for the client to reestablish the connection. I dug into related issues on the requests side (like this one, https://github.com/psf/requests/issues/4937) that hinted at the root cause. Essentially, what I think is happening is that,

First connection is made to OpenAI, returns a result, requests maintains a connection under the hood with default keep-alive
some time passes, in my experience, around 10 minutes should do
New connection is made to OpenAI, but the client throws a ConnectionResetError
- A new call after this succeeds

I believe that the OpenAI servers are terminating the connection after a brief time (perhaps minutes) but the client still tries to keep it alive.

The reason why I think this is a bug worth reporting is that I think you could modify the client code so it responds more gracefully to these server-side settings. Changing some of the keep-alive settings from the default ones would help out several folks using this.

To Reproduce

Write a long-running program. In our case, we have a Python web server running FastAPI
As part of a route for the server, call OpenAI to do some work. In our case, we're calling openai.ChatCompletion.create with gpt-3.5-turbo to manipulate some input language and respond back with it
Run the server and call the endpoint once
Wait 10 minutes
Call the endpoint again
You'll likely get a Connection reset by peer issue on the second call

Code snippets

No response

OS

Linux

Python version

Python v3.8

Library version

openai-python 0.27.2

zwhitchcox commented 1 year ago

i am experiencing the same issue, also using FastAPI

This is my code, for reference...including everything, because I'm not exactly sure what is relevant and what is not

```python import openai import os from db import prisma from config import recording_dir import asyncio import json openai.api_key = os.environ.get("OPENAI_API_KEY") async def get_transcript_openai(call_id: str) -> str: file_name = f"{recording_dir}/{call_id}.mp3" audio_file = open(file_name, "rb") prompt = "This is a phone call between a customer service representative and possibly a potential customer, customer, or technician:" max_retries = 50 current_retry = 0 while current_retry < max_retries: try: print("transcribing", call_id) response = await asyncio.to_thread(openai.Audio.transcribe, "whisper-1", audio_file, prompt=prompt, response_format="verbose_json") segments = response["segments"] if await prisma.transcriptionsegment.find_first({"where": {"callId": call_id}}): print(f"Skipping transcription of {call_id} because it already exists") return print("transcription of " + str(call_id) + " complete") # Save the verbose JSON to the Transcription model for segment in segments: await prisma.transcriptionsegment.create({ 'avg_logprob': segment['avg_logprob'], 'end': segment['end'], 'no_speech_prob': segment['no_speech_prob'], 'seek': segment['seek'], 'start': segment['start'], 'text': segment['text'], 'tokens': str(segment['tokens']), 'transient': segment['transient'], 'callId': call_id, }) # Break the loop if the transcription was successful break except Exception as e: print("filename", file_name) current_retry += 1 print(f"Error transcribing call {call_id} (attempt {current_retry}): {str(e)}") # If all retries have failed, raise the exception if current_retry == max_retries: raise e await asyncio.sleep(1) ```

The first time it tries, the connection is dropped with a 104 error:

Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

After that, I get the error:

Error transcribing call 128316136 (attempt 2): Invalid file formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

Even though, from the docs:

File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Also, the call length is only 5MB

-rw-r--r-- 1 zwhitchcox zwhitchcox 5.6M Apr  6 14:45 data/recordings/128316136.mp3

It seems the problem is in the openai's servers, because the Invalid file formats error, does not appear to be in this repo

My local whisper instance transcribes it just fine, btw

``` import whisper model = whisper.load_model("medium") file="../data/recordings/128316136.mp3" result = model.transcribe(file, initial_prompt="This is a phone call between a customer service representative and possibly a potential customer, customer, or technician:") print(result["text"]) ```

I start the server with uvicorn (uvicorn app:app --reload --port 5000), which reloads kills/restarts the server whenever I make a change, but sometimes, when I kill the server, nongracefully, maybe some orphan processes are leftover, because the port is still in use (5000), and so I kill that process.

I'm thinking maybe somehow those sockets that are left open might be communicating with the OpenAI servers, somehow, and maybe OpenAI's servers are blocking requests from my IP address. I don't really know what could be happening, but it seems like that could be the source of the issue, because I sometimes force kill the server in the middle of a transcription during development, to avoid having to pay for whisper API calls that I'm not using.

Not sure if that is helpful, just a little more triage info.

indrasvat commented 1 year ago

Seeing the above issue intermittently in a local Jupyter notebook.

Error:

File /opt/homebrew/lib/python3.11/site-packages/openai/api_requestor.py:529, in APIRequestor.request_raw(self, method, url, params, supplied_headers, files, stream, request_id, request_timeout)
    527     raise error.Timeout("Request timed out: {}".format(e)) from e
    528 except requests.exceptions.RequestException as e:
--> 529     raise error.APIConnectionError(
    530         "Error communicating with OpenAI: {}".format(e)
    531     ) from e
    532 util.log_debug(
    533     "OpenAI API response",
    534     path=abs_url,
   (...)
    537     request_id=result.headers.get("X-Request-Id"),
    538 )
    539 # Don't read the whole stream for debug logging unless necessary.

APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Main function:

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

Full stacktrace:

jp-nb-apiconnectionerror.log

Versions:

$ pip list | grep openai
openai                   0.27.5

$ python3 -V
Python 3.11.3

mjamei commented 1 year ago

Any updates on this? We also see this fairly regularly.

Domincog commented 1 year ago

I am having the same issue. Making an API call and then waiting for 30 minutes before calling again results in: openai.error.APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

hc20k commented 1 year ago

This is my workaround, I just wrap all of my OpenAI function calls inside this:

async with ClientSession() as s:
     openai.aiosession.set(s)
     response = await openai.Completion.acreate( ...

This way a new client session is made after waiting and it doesn't use the old one that will fail.

lelarson commented 1 year ago

Could you please help me understand why my attempt to apply that to an embedding function throws an error?

''' async def process_inputs(inputs, model_id="text-embedding-ada-002"):

embeddings = []

async with aiohttp.ClientSession() as s:
    openai.aiosession.set(s)
    for sentence in inputs:
        response = await openai.Embedding.create(
            engine=deployment_id,
            model=model_id,
            input=sentence,
            max_tokens=100,
            temperature=0
        )
        embeddings.append(response['data'][0]['embedding'])
return embeddings

'''

embeddings = await process_inputs(df['openai'].tolist()) Result: TypeError: object OpenAIObject can't be used in 'await' expression

hc20k commented 1 year ago

Could you please help me understand why my attempt to apply that to an embedding function throws an error?

'''

async def process_inputs(inputs, model_id="text-embedding-ada-002"):
embeddings = []

async with aiohttp.ClientSession() as s:

    openai.aiosession.set(s)

    for sentence in inputs:

        response = await openai.Embedding.create(

            engine=deployment_id,

            model=model_id,

            input=sentence,

            max_tokens=100,

            temperature=0

        )

        embeddings.append(response['data'][0]['embedding'])

return embeddings
'''

embeddings = await process_inputs(df['openai'].tolist())

Result:

TypeError: object OpenAIObject can't be used in 'await' expression

It needs to be wrapped in an async function first, and then you can call it using asyncio

lelarson commented 1 year ago

It is an async function, but github Is not formatting that first line right

turnham commented 1 year ago

If anyone is looking for a workaround that does not requiring changing to async, the following is working for us. It's the same idea as hc20k's workaround above: https://github.com/openai/openai-python/issues/371#issuecomment-1537622984

Using the support added in v0.27.6 to pass in a session we do the following:

        # Pass a new session to the openai module
        openai.requestssession = requests.Session()

        # Existing code calling openai
        response = openai.Completion.create(...)

        # Close and reset the session
        try:
            openai.requestssession.close()
        except Exception as e:
            logging.exception(e)
        openai.requestssession = None

or using the 'with' syntax:

with requests.Session() as session:
    openai.requestssession = session
    response = openai.Completion.create(...)
    openai.requestssession = None

We're not sure if setting the openai.requestssession to None is required but we weren't sure what else might be done with the that attribute in the openai module. In our testing, we are no longer seeing the errors on long-running (web app) threads that make openai calls.

mathcass commented 1 year ago

@turnham It'll do the job but one thing you might miss out on is potential speed improvements by reusing persistent connections. The requests docs on Sessions explain this briefly and link to the basic idea. I still think that since OpenAI knows their own server configurations, if they modify the keep-alive settings, that'll have the most improvement on community use.

In my own case, I switched over to using tenacity, since the OpenAI docs recommend it.

@retry(
    stop=stop_after_attempt(2),
    retry=retry_if_exception_type(openai.error.APIConnectionError),
)
def call_openai():
    ...

turnham commented 1 year ago

@mathcass +1. Yes it would be great if the openai module would take care of all of this.

I don't think your approach with tenacity retries would have helped with our situation though. Once we had a long running thread get into this state, all retries would fail. So to get that use case working consisently, we had to force the resetting openai's _thread_context.session, by making sure a cached session was never present to be re-used: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_requestor.py#L79

But also adding retries sounds like something we should be doing regardless of this issue, so thanks for the pointers to tenacity!

foomprep commented 1 year ago

Just to add for anybody who comes looking I have this problem when I'm using a VPN, not sure why. If I shut off the VPN the problem goes away. I am outside the US by the way.

EDIT: It now works with VPN lol

bouch20 commented 1 year ago

The same error was confirmed when using Azure OpenAI's gpt-3.5-turbo. Changing the model version from 0301 to 0601 resolved the issue. Anyone in the same situation may want to try this.

gepolv commented 1 year ago

@turnham so we can say the openai client is not thread safe since the openai.requestssession is global and can be changed in each thread. Is my understanding right?

ShantanuNair commented 1 year ago

@pamelafox Since you seem to be the most well versed with this issue - I'd like to ask, should this now work with the openai-python client? I ran into this issue using long running mapreduce calls in langchain. The error I would see is a aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed and langchain doesn't retry on it. It typically happens after my client has sent a request to 3.5-turbo-16k (openai) but with a large number of input tokens (10-12k) and after 2-3 minutes it gives me this error, bills me, and I don't end up with a generation.

body="<StreamReader e=ClientPayloadError('Response payload is not completed')>" message='Response payload is not completed'
body="<StreamReader e=ClientPayloadError('Response payload is not completed')>" message='Response payload is not completed'
Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 352, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 510, in run_input
    value = await res
  File "/root/reports.py", line 172, in analyze_document_summarize_llm_chain
    res = await chain.acall(inputs={'input_documents': texts}, return_only_outputs=True)
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 361, in acall
    raise e
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/base.py", line 355, in acall
    await self._acall(inputs, run_manager=run_manager)
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/combine_documents/base.py", line 121, in _acall
    output, extra_return_dict = await self.acombine_docs(
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/combine_documents/map_reduce.py", line 240, in acombine_docs
    map_results = await self.llm_chain.aapply(
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/llm.py", line 209, in aapply
    raise e
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/llm.py", line 206, in aapply
    response = await self.agenerate(input_list, run_manager=run_manager)
  File "/usr/local/lib/python3.9/site-packages/langchain/chains/llm.py", line 115, in agenerate
    return await self.llm.agenerate_prompt(
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/base.py", line 424, in agenerate_prompt
    return await self.agenerate(
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/base.py", line 384, in agenerate
    raise exceptions[0]
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/base.py", line 485, in _agenerate_with_cache
    return await self._agenerate(
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 425, in _agenerate
    response = await acompletion_with_retry(
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 92, in acompletion_with_retry
    return await _completion_with_retry(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.9/site-packages/tenacity/_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/langchain/chat_models/openai.py", line 90, in _completion_with_retry
    return await llm.client.acreate(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/openai/api_resources/chat_completion.py", line 45, in acreate
    return await super().acreate(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 217, in acreate
    response, _, api_key = await requestor.arequest(
  File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 382, in arequest
    resp, got_stream = await self._interpret_async_response(result, stream)
  File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 729, in _interpret_async_response
    (await result.read()).decode("utf-8"),
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1037, in read
    self._body = await self.content.read()
  File "/usr/local/lib/python3.9/site-packages/aiohttp/streams.py", line 349, in read
    raise self._exception
  File "/usr/local/lib/python3.9/site-packages/openai/api_requestor.py", line 722, in _interpret_async_response
    await result.read()
  File "/usr/local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1037, in read
    self._body = await self.content.read()
  File "/usr/local/lib/python3.9/site-packages/aiohttp/streams.py", line 375, in read
    block = await self.readany()
  File "/usr/local/lib/python3.9/site-packages/aiohttp/streams.py", line 397, in readany
    await self._wait("readany")
  File "/usr/local/lib/python3.9/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

If this has indeed been fixed, then perhaps updating my openai lib version would solve the issue within my chains. I'm unsure though as this isn't easily reproducible so I cannot test it.

microsoftbuild commented 1 year ago

In the latest version i.e 0.28.0, you can also pass a request_timeout = <timeout in sec> and have the session closed after the timeout.

ShantanuNair commented 1 year ago

@microsoftbuild Looks like this issue could bring up issues again, wrt OP: https://github.com/openai/openai-python/pull/387

Also, we do want high timeouts as our llm chains can potentially take many minutes to run, but not always.

microsoftbuild commented 1 year ago

@ShantanuNair In that case, you could use a request_timeout = <timeout in sec> param in the following workaround suggested by @turnham :

with requests.Session() as session:
    openai.requestssession = session
    response = openai.Completion.create(...)
    openai.requestssession = None

ShantanuNair commented 1 year ago

@microsoftbuild See this. I already have 600s set as the timeout, and this issue also impacts retries on 502 from Cloudflare.

I then tried specifying a request_timeout parameter for the OpenAI API request, but that caused every request to timeout, due to this issue: https://github.com/openai/openai-python/pull/387

Appreciate your help!

rattrayalex commented 1 year ago

This should be fixed in the beta of our upcoming v1.0.0; can you try it out and let us know whether or not it seems to be resolved?

openai / openai-python