Open BeatWolf opened 4 months ago
Running into a very similar connection loss issue using Gradio on AWS SageMaker JupyterLabs for long-running processing.
It looks like downgrading Gradio to 3.50.2
helped, it seems to be working reliably now.
Downgrading to 3.50.2 does not seem to solve the issue of not reconnecting
edit: My bad. This actually solves it. Does this mean gradio does not work on unreliable internet connections with all recent versions?
Long-running processing (> 2 mins) is still a bit flaky on 3.50.2, but it works most of the time. Whereas for 4.29, it never works, the connection is always lost.
Hi, any updates? I meet the problem too when upgrading to 4.xx. It turns out that there is always lag or lost connection.
Describe the bug
After a good day of debugging i'm stuck and that is why i'm coming here.
I have a gradio app, using the latest gradio version, which has a bunch of chatbot input fields that connect to an LLM. It is a kind of RAG application. The application is deployed on kubernetes with an nginx ingress.
So now to the error. When using the application, sometimes i randomly loose connection. This looks like this in the browser:
And like this in the console:
Now, this is kind of a double issue. The first is loosing the connection. Both the heartbeat and the eventstream get closed. I suspect by the ingress, as i see no errors on the gradio server pod (is there a way to add verbose output there?).
I tried pretty much all the nginx annotations i could find that could remotely be related to the problem, nothing helped. Just to be complete, here they are:
Now, i don't have a reproduction example, because the problem does not happen locally, only on the server.
This brings us to the second issue. When the error happens, i can no longer use the application until i refresh the page. Now, connection can be lost, thats not an issue i would say. What i find strange, is that there is no reconnection from the gradio side. I also searched on how to force a reconnection (maybe catching errors and simply reconnecting), but i found nothing in the documentation.
So, i know its a bit mysterious. If anybody can help to understand the random connection drops, great. But i would already be very happy if i could somehow tell gradio to reconnect when there is an error with the connection, some kind of retry (which i know will work, because the server is fine).
Have you searched existing issues? 🔎
Reproduction
Screenshot
No response
Logs
No response
System Info
Severity
Blocking usage of gradio