timoklimmer / powerproxy-aoai

Monitors and processes traffic to and from Azure OpenAI endpoints.
MIT License
83 stars 23 forks source link

Experiencing ResponseNotRead errors #89

Closed LeonKalt closed 2 weeks ago

LeonKalt commented 2 weeks ago

Hey we are currently experiencing the follow error on v0.10.3:


4-06-19T07:24:11.108Z | ERROR: Exception in ASGI application
-- | --
  | 2024-06-19T07:24:11.108Z | Traceback (most recent call last):
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
  | 2024-06-19T07:24:11.108Z | result = await app( # type: ignore[func-returns-value]
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
  | 2024-06-19T07:24:11.108Z | await super().__call__(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
  | 2024-06-19T07:24:11.108Z | await self.middleware_stack(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
  | 2024-06-19T07:24:11.108Z | await self.app(scope, receive, _send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
  | 2024-06-19T07:24:11.108Z | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  | 2024-06-19T07:24:11.108Z | await app(scope, receive, sender)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
  | 2024-06-19T07:24:11.108Z | await self.middleware_stack(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
  | 2024-06-19T07:24:11.108Z | await route.handle(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
  | 2024-06-19T07:24:11.108Z | await self.app(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
  | 2024-06-19T07:24:11.108Z | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  | 2024-06-19T07:24:11.108Z | await app(scope, receive, sender)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
  | 2024-06-19T07:24:11.108Z | response = await func(request)
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
  | 2024-06-19T07:24:11.108Z | raw_response = await run_endpoint_function(
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
  | 2024-06-19T07:24:11.108Z | return await dependant.call(**values)
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/app/powerproxy.py", line 371, in handle_request
  | 2024-06-19T07:24:11.108Z | f"Text: {aoai_response.text} "
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 576, in text
  | 2024-06-19T07:24:11.108Z | content = self.content
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 570, in content
  | 2024-06-19T07:24:11.108Z | raise ResponseNotRead()
  | 2024-06-19T07:24:11.108Z | httpx.ResponseNotRead: Attempted to access streaming response content, without having called `read()`.
LeonKalt commented 2 weeks ago

We temporarily could fix the issue by adding await aoai_response.aread() to the status check.

        if aoai_response.status_code != 200:
            # print infos to console
            await aoai_response.aread()
            print(
                (
                    f"Unexpected HTTP Code {aoai_response.status_code} while using target '{aoai_target['name']}'. "
                    f"Text: {aoai_response.text} "
                    f"Path: {routing_slip['path']} "
                    f"Url: {aoai_target['url']}"
                )
            )

No we get Unexpected HTTP Code 408 while using target ... Errors that are no handled by the following

        # got 429 or 500
        if aoai_response.status_code in [429, 500]:
            # block endpoint for some time, either according to the time given by AOAI or, if not
            # available, for 10 seconds
            waiting_time_ms_until_next_request = (
                int(aoai_response.headers["retry-after-ms"]) if "retry-after-ms" in aoai_response.headers else 10_000
            )
            aoai_target["next_request_not_before_timestamp_ms"] = (
                get_current_timestamp_in_ms() + waiting_time_ms_until_next_request
            )

            # try next target
            continue

Shouldn't this include handling of timeouts as well?

timoklimmer commented 2 weeks ago

@LeonKalt I have fixed the .aread() issue and checked into main. I will create a new release later today. Thanks for reaching out and bringing this up!

The 408 timeout occurs when the client does not send the request fast enough to the server. If I add 408 to the list, we would simply hide an issue which could/should potentially be fixed. Do you have an idea why a 408 is encountered in your scenario? Could it be related to where you host PowerProxy and/or how the network connection is under pressure? What kind of prompts are you sending? Is larger prompts? Just trying to better understand the context before adding 408 to the list above. Thanks!

timoklimmer commented 2 weeks ago

I have seen you have opened another timeout-related issue, rendering this as duplicate, hence closing.

LeonKalt commented 2 weeks ago

@LeonKalt I have fixed the .aread() issue and checked into main. I will create a new release later today. Thanks for reaching out and bringing this up!

The 408 timeout occurs when the client does not send the request fast enough to the server. If I add 408 to the list, we would simply hide an issue which could/should potentially be fixed. Do you have an idea why a 408 is encountered in your scenario? Could it be related to where you host PowerProxy and/or how the network connection is under pressure? What kind of prompts are you sending? Is larger prompts? Just trying to better understand the context before adding 408 to the list above. Thanks!

Our managed PTU had some issues so we ran into a scenario where customers where effected instead of falling back on our paygo deployment and although I understand the issue of hiding errors, I think full filling the request with an answer is preferable to just returning an error to the application.

timoklimmer commented 2 weeks ago

@LeonKalt Ok I see. I have added the 408/Request Timeout code to the list of codes that will trigger trying the next AOAI target instead of just forwarding the 408 back. New release is out.