gradio-app / gradio

Build and share delightful machine learning apps, all in Python. šŸŒŸ Star to support our work!
http://www.gradio.app
Apache License 2.0
32.21k stars 2.41k forks source link

ERROR: Exception in ASGI application - fastapi.exceptions.HTTPException: 404: Session not found #9169

Open skye0402 opened 3 weeks ago

skye0402 commented 3 weeks ago

Describe the bug

Since some time (I can't say exactly which release version it started, (currently on 4.41.0 it wasn't happening with 4.22 (and maybe later) that I know) I get below session error. The Gradio app is running on Kubernetes behind an approuter. The error isn't reproducible for me but I saw other issues with same error #9070 but more suitable #6920. I already use sticky sessions, have maybe 20 concurrent users at peak and 4-5 instances of the app running. It happens maybe in 5% of the cases (it's hard to measure). But it didn't happen on older Gradio (I never upgraded the approuter). So I wonder if there's any way we can fix it? This is a nasty error, because I user can only circumvent it by opening an incognito window or clearing the cookie.

Have you searched existing issues? šŸ”Ž

Reproduction

import gradio as gr

Screenshot

No response

Logs

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 261, in wrap
await func()
File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
await self.message_event.wait()
File "/usr/local/lib/python3.12/asyncio/locks.py", line 212, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fbaab652990
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/applications.py", line 123, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 186, in __call__
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 164, in __call__
| await self.app(scope, receive, _send)
| File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 727, in __call__
| await self.app(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 754, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 774, in app
| await route.handle(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 295, in handle
| await self.app(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 77, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/usr/local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/usr/local/lib/python3.12/site-packages/starlette/routing.py", line 75, in app
| await response(scope, receive, send)
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 258, in __call__
| async with anyio.create_task_group() as task_group:
| File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/usr/local/lib/python3.12/site-packages/gradio/routes.py", line 980, in sse_stream
| raise e
| File "/usr/local/lib/python3.12/site-packages/gradio/routes.py", line 915, in sse_stream
| raise HTTPException(
| fastapi.exceptions.HTTPException: 404: Session not found.
+------------------------------------

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.41.0
gradio_client version: 1.3.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
anyio: 4.4.0
fastapi: 0.112.1
ffmpy: 0.4.0
gradio-client==1.3.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.24.5
importlib-resources: 6.4.2
jinja2: 3.1.4
markupsafe: 2.1.5
matplotlib: 3.9.2
numpy: 1.26.4
orjson: 3.10.7
packaging: 24.1
pandas: 2.2.2
pillow: 10.4.0
pydantic: 2.8.2
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.2
ruff: 0.6.0
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.12.3
typing-extensions: 4.12.2
urllib3: 2.2.2
uvicorn: 0.30.6
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.

gradio_client dependencies in your environment:

fsspec: 2024.6.1
httpx: 0.27.0
huggingface-hub: 0.24.5
packaging: 24.1
typing-extensions: 4.12.2
websockets: 12.0

Severity

I can work around it

skye0402 commented 1 week ago

Any chance to look into it? I found the likelihood of 404 errors increases if the pod ages (e.g. more than 1 day old). It was definitely not happening with older Gradio versions.

w8jie commented 1 week ago

Issue persists for me too, I am running gradio app on multiple AWS EKS pods and 404 error shows up frequently tho not all the time. Had to downgrade gradio version to gradio==3.50.2.

Please look into it.

skye0402 commented 1 week ago

@w8jie It works without errors with e.g. Gradio 4.1x - at some point the bug was introduced.

abidlabs commented 5 days ago

Apologies for the late response. We'll need a methodical repro in order for us to investigate this issue. Would either of you be able to provide one?

skye0402 commented 4 days ago

@abidlabs - I understand that. Thing is, this error just happens not all the time. A session is working fine over a certain time. Then the error occurs leading to error 404. If I open an incognito window I can work again because that's a new session. But the session in the regular browsing window is lost. Istio will use the session ID from the browser to direct it to the pod where the gradio app runs that owns this session ID. But then above error log appears. So far I wasn't able to provoke it, I think it's more likely the "older" the instance gets that runs Gradio. In such a case I have 2 options: Wait until the session expires or restart the pod (manually).

It's become a real problem - I'd say it happens in 10 to 20% of the cases a user wants to continue work. It's always above error and it seems the session ID is forgotten by Gradio (maybe after starlette raised the ASGI error?)

I can offer access to the instance for one of your developers if that's of any help and of course access to the source code.

skye0402 commented 3 days ago

@abidlabs I took my chances and downgraded starlette to 0.37.2 (which goes back to March this year) and see if this fixes the problem. Next starlette was from July which could be the time the problems started. Will update if that helped.

skye0402 commented 1 day ago

@abidlabs - Downgrading starlette didn't fix the error at least not until 0.37.2. I don't know which version was part of Gradio in May/June where Gradio didn't show the error. If it went back to an older version I could try to further downgrade.