gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
29.84k stars 2.22k forks source link

Random connection errors in gradio 4 #7576

Closed pseudotensor closed 2 months ago

pseudotensor commented 3 months ago

Describe the bug

Getting random connection errors in gradio 4 not seen in gradio 3

https://github.com/h2oai/h2ogpt/issues/1439

Have you searched existing issues? 🔎

Reproduction

No repro yet, happens randomly. I shared dev console after hit.

Screenshot

image

image

Nothing at all appears in the logs. Seems like pure UI issue.

Logs

No response

System Info

gradio 4.19 but seen for all gradio 4

Severity

Blocking usage of gradio

abidlabs commented 3 months ago

I think this is the same underlying issue as https://github.com/gradio-app/gradio/issues/7531 and should be fixed by https://github.com/gradio-app/gradio/pull/7565 -- the errors look quite similar in the console log, correct?

pseudotensor commented 3 months ago

The gr.Audio issue doesn't generate any UI errors about connection error:

image

Or the connection error shown in the dev console:

image

image

The other errors are not new.

Also, note I see the connection error (and that other user sees them) randomly even locally without nginx involved.

And lastly, nothing appears in server logs when this happens, why it seems "pure UI" issue, like it tries to reach server and fails prematurely or something.

oobabooga commented 3 months ago

I also get these and they also prevent me from upgrading to Gradio 4. Gradio 3.53 suffers from a severe performance issue during streaming that has since been fixed, but now Gradio 4 suffers from these chronic connection issues.

abidlabs commented 3 months ago

Got it thanks @oobabooga and @pseudotensor -- if you can help us with a repro, that would be super appreciated.

cc @aliabid94

pseudotensor commented 3 months ago

I've tried to see if there is some pattern, but I can't find it. I shared the dev console. I think the gradio UI needs to show more information about what happened and why. "Connection error" raised is too generic. I can't tell from dev console where in UI code it happened, I just end up in a place where the error is printed.

abidlabs commented 2 months ago

@freddyaboulton would you be able to look into this?

@pseudotensor @oobabooga can you tell us your set up when you run into this issue? Are you running on a regular machine and see this error locally? Or is the set up something else

oobabooga commented 2 months ago

@abidlabs behold, I have obtained a reproduction after a lot of trial and error. Here it is:

import random
import time

import gradio as gr

def make_number():
    time.sleep(random.random() * 0.01)
    return str(random.randint(0, 8))

with gr.Blocks() as demo:
    with gr.Tab():
        msg1 = gr.Textbox()
        msg2 = gr.Textbox()
        msg3 = gr.Textbox()
        msg4 = gr.Textbox()

    with gr.Tab():
        msg5 = gr.Textbox()
        msg6 = gr.Textbox()
        msg7 = gr.Textbox()
        msg8 = gr.Textbox()

    demo.load(make_number, None, msg1, show_progress=False)
    demo.load(make_number, None, msg2, show_progress=False)
    demo.load(make_number, None, msg3, show_progress=False)
    demo.load(make_number, None, msg4, show_progress=False)
    demo.load(make_number, None, msg5, show_progress=False)
    demo.load(make_number, None, msg6, show_progress=False)
    demo.load(make_number, None, msg7, show_progress=False)
    demo.load(make_number, None, msg8, show_progress=False)

    msg1.change(lambda x: x, msg1, msg2)
    msg2.change(lambda x: x, msg2, msg3)
    msg3.change(lambda x: x, msg3, msg4)
    msg4.change(lambda x: x, msg4, msg5)
    msg5.change(lambda x: x, msg5, msg6)
    msg6.change(lambda x: x, msg6, msg7)
    msg7.change(lambda x: x, msg7, msg8)

demo.queue()
demo.launch(
    max_threads=64,
    server_name='0.0.0.0',
#    ssl_verify=False,
#    ssl_keyfile='key.pem',
#    ssl_certfile='cert.pem'
)

To reproduce:

1) Launch the script above with python gradio-error.py 2) Access the UI from another computer in the same local network 3) If the error doesn't happen on the first try, refresh the page. It may be necessary to refresh some 10 times before it happens.

This is the error:

error

I hope that helps identify what is going on so that I can finally update to Gradio 4 :')

freddyaboulton commented 2 months ago

Thanks @oobabooga ! I can repro on one machine with a share link so I'll use that to look into this. Much appreciated!! 🙏

pseudotensor commented 2 months ago

Ya, on my local network I tried about 50 times didn't repro with that script. But with share link, it took about 25 times.

oobabooga commented 2 months ago

I can reproduce it on my own computer (localhost, without using the network) by adding more elements:

import random
import time

import gradio as gr

def make_number():
    time.sleep(random.random() * 0.01)
    return str(random.randint(0, 8))

with gr.Blocks() as demo:
    with gr.Tab():
        msg1 = gr.Textbox()
        msg2 = gr.Textbox()
        msg3 = gr.Textbox()
        msg4 = gr.Textbox()
        msg5 = gr.Textbox()
        msg6 = gr.Textbox()
        msg7 = gr.Textbox()
        msg8 = gr.Textbox()
        msg9 = gr.Textbox()
        msg10 = gr.Textbox()
        msg11 = gr.Textbox()
        msg12 = gr.Textbox()

    with gr.Tab():
        msg13 = gr.Textbox()
        msg14 = gr.Textbox()
        msg15 = gr.Textbox()
        msg16 = gr.Textbox()
        msg17 = gr.Textbox()
        msg18 = gr.Textbox()
        msg19 = gr.Textbox()
        msg20 = gr.Textbox()
        msg21 = gr.Textbox()
        msg22 = gr.Textbox()
        msg23 = gr.Textbox()
        msg24 = gr.Textbox()

    demo.load(make_number, None, msg1, show_progress=False)
    demo.load(make_number, None, msg2, show_progress=False)
    demo.load(make_number, None, msg3, show_progress=False)
    demo.load(make_number, None, msg4, show_progress=False)
    demo.load(make_number, None, msg5, show_progress=False)
    demo.load(make_number, None, msg6, show_progress=False)
    demo.load(make_number, None, msg7, show_progress=False)
    demo.load(make_number, None, msg8, show_progress=False)
    demo.load(make_number, None, msg9, show_progress=False)
    demo.load(make_number, None, msg10, show_progress=False)
    demo.load(make_number, None, msg11, show_progress=False)
    demo.load(make_number, None, msg12, show_progress=False)
    demo.load(make_number, None, msg13, show_progress=False)
    demo.load(make_number, None, msg14, show_progress=False)
    demo.load(make_number, None, msg15, show_progress=False)
    demo.load(make_number, None, msg16, show_progress=False)
    demo.load(make_number, None, msg17, show_progress=False)
    demo.load(make_number, None, msg18, show_progress=False)
    demo.load(make_number, None, msg19, show_progress=False)
    demo.load(make_number, None, msg20, show_progress=False)
    demo.load(make_number, None, msg21, show_progress=False)
    demo.load(make_number, None, msg22, show_progress=False)
    demo.load(make_number, None, msg23, show_progress=False)
    demo.load(make_number, None, msg24, show_progress=False)

    msg1.change(lambda x: x, msg1, msg2)
    msg2.change(lambda x: x, msg2, msg3)
    msg3.change(lambda x: x, msg3, msg4)
    msg4.change(lambda x: x, msg4, msg5)
    msg5.change(lambda x: x, msg5, msg6)
    msg6.change(lambda x: x, msg6, msg7)
    msg7.change(lambda x: x, msg7, msg8)
    msg8.change(lambda x: x, msg8, msg9)
    msg9.change(lambda x: x, msg9, msg10)
    msg10.change(lambda x: x, msg10, msg11)
    msg11.change(lambda x: x, msg11, msg12)
    msg12.change(lambda x: x, msg12, msg13)
    msg13.change(lambda x: x, msg13, msg14)
    msg14.change(lambda x: x, msg14, msg15)
    msg15.change(lambda x: x, msg15, msg16)
    msg16.change(lambda x: x, msg16, msg17)
    msg17.change(lambda x: x, msg17, msg18)
    msg18.change(lambda x: x, msg18, msg19)
    msg19.change(lambda x: x, msg19, msg20)
    msg20.change(lambda x: x, msg20, msg21)
    msg21.change(lambda x: x, msg21, msg22)
    msg22.change(lambda x: x, msg22, msg23)
    msg23.change(lambda x: x, msg23, msg24)

demo.queue()
demo.launch(
    max_threads=64,
#    server_name='0.0.0.0',
#    share=True
#    ssl_verify=False,
#    ssl_keyfile='key.pem',
#    ssl_certfile='cert.pem'
)

The error happens almost every time I refresh the page with this updated script.

oobabooga commented 2 months ago

The issue still happens in gradio==4.20.0 with gradio_client==0.11.0:

print

@abidlabs now that a simple reproduction code is available, can the "needs repro" label be removed?

abidlabs commented 2 months ago

Thanks @oobabooga for providing this repro. I'll let @freddyaboulton, who is investigating this issue, adjust the label accordingly, hopefully he is able to reproduce in his environment!

freddyaboulton commented 2 months ago

@oobabooga this was not fixed in the current release. I removed the "needs repro" label!

oobabooga commented 2 months ago

Simpler repro below:

import random

import gradio as gr

def make_number():
    return str(random.randint(0, 8))

with gr.Blocks() as demo:
    msgs = []

    for i in range(64):
        msgs.append(gr.Textbox())

    for msg in msgs:
        demo.load(make_number, None, msg, show_progress=False)

    for i in range(len(msgs) - 1):
        msgs[i].change(lambda x: x, msgs[i], msgs[i+1])

demo.queue()
demo.launch(
    max_threads=64,
)
pseudotensor commented 2 months ago

I still see this for gradio 4.20.1 outside of the particular repros, just normal trivial usage hits this randomly. E.g. just choosing a drop-down item, and hit it just now. Ignore the "failed to load resources" ones, that's after I terminated gradio.

image

abidlabs commented 2 months ago

Yes we're still looking into this issue @pseudotensor

oobabooga commented 2 months ago

@freddyaboulton @abidlabs @aliabd sorry to ping again, but this bug makes Gradio 4 unusable for my project, as it errors out half the time as soon as the UI is launched, as well as randomly during normal use. It was introduced in the Gradio 4 update, as it is not present in Gradio 3.50.2.

Is this something that will require major reworking of the Gradio code base or could it be a simple race condition somewhere? The simple reproduction code above has a 100% error rate.

aliabid94 commented 2 months ago

just saw the repro, thanks for that! Gonna give it my best shot and try to wrap this one up tonight

pseudotensor commented 2 months ago

Thanks!

aliabid94 commented 2 months ago

Can you guys try the package from https://github.com/gradio-app/gradio/pull/7683 and see if it fixes your issues? To install:

pip install https://gradio-builds.s3.amazonaws.com/f64ab41e78ed2bd9838ae967ab9be9b4a40aeef7/gradio-4.21.0-py3-none-any.whl
oobabooga commented 2 months ago

@aliabid94 this changes the error for me from "Connection errored out" to "404 session not found".

print2

print

These are while starting text-generation-webui. In both cases, the error happens inconsistently.

aliabid94 commented 2 months ago

kill me 😭 lemme see what's up

aliabid94 commented 2 months ago

@oobabooga can you try

pip install https://gradio-builds.s3.amazonaws.com/48b644b6553c56d594981c9b4a99209f0915fbc2/gradio-4.21.0-py3-none-any.whl

(from https://github.com/gradio-app/gradio/pull/7691)

oobabooga commented 2 months ago

@aliabid94 it seems to fix the issue for me :)

I have launched text-generation-webui some 50 times, and have also used the UI for a bit, and didn't see any of the error popups, whereas they happened very consistently before. No errors in the terminal as well.

My simple reproduction code above also doesn't generate errors anymore.

Well done!

aliabid94 commented 2 months ago

oh thank god, haha alright I'll clean up the PR!

lilsgit commented 1 month ago

Is this actually fixed? I am experiencing the same connection errored out issue with my TextArea with Gradio 4.27

abidlabs commented 1 month ago

Hi @lilsgit can you create a separate issue with a standalone repro so that we can investigate?