gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
29.84k stars 2.22k forks source link

404 Session Not Found error When accessing gradio via a proxy #6920

Closed lykeven closed 1 month ago

lykeven commented 5 months ago

Describe the bug

I am running a Gradio application locally, where there's a requests request to a remote server in the click event function of a button, and the result is returned to the component. Everything works fine, but if I turn on a proxy (Shadowsocks) to access the Gradio application, requests with short response time return normally, while requests that take longer return exceptions.

Have you searched existing issues? 🔎

Reproduction

#!/usr/bin/env python

import gradio as gr
import os
import json
import requests
import base64

URL = os.environ.get("URL")

def image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
        return encoded_string.decode('utf-8')

def post(
        input_text,
        image_prompt,
        ):
    headers = {
        "Content-Type": "application/json; charset=UTF-8",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
    }
    if image_prompt:
        encoded_img = image_to_base64(image_prompt)
    else:
        return "", []

    data = json.dumps({
        'text': input_text,
        'history': [],
        'image': encoded_img
    })
    try:
        response = requests.request("POST", URL, headers=headers, data=data, timeout=(60, 100)).json()
    except Exception as e:
        return "", []
    answer = str(response['result'])
    return "", [[input_text, answer]]

def main():
    gr.close_all()
    with gr.Blocks() as demo:
        with gr.Row():
            with gr.Column(scale=4.5):
                with gr.Group():
                    input_text = gr.Textbox(label='Input Text', placeholder='Please enter text prompt below and press ENTER.')
                    with gr.Row():
                        run_button = gr.Button('Generate')
                    image_prompt = gr.Image(type="filepath", label="Image Prompt", value=None)

            with gr.Column(scale=5.5):
                result_text = gr.components.Chatbot(label='Multi-round conversation History', value=[("", "Hi, What do you want to know about this image?")], height=550)

        run_button.click(fn=post,inputs=[input_text, image_prompt],
                         outputs=[input_text, result_text])
    demo.launch(server_port=7862)

if __name__ == '__main__':
    main()

Screenshot

image

Logs

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/routing.py", line 73, in app
    await response(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/responses.py", line 259, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
    raise exceptions[0]
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/responses.py", line 255, in wrap
    await func()
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/responses.py", line 244, in stream_response
    async for chunk in self.body_iterator:
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/gradio/routes.py", line 660, in sse_stream
    raise e
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/gradio/routes.py", line 601, in sse_stream
    raise HTTPException(
fastapi.exceptions.HTTPException: 404: Session not found.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/applications.py", line 116, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/routing.py", line 746, in __call__
    await route.handle(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/routing.py", line 75, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/user/anaconda3/envs/py3.8/lib/python3.8/site-packages/starlette/_exception_handler.py", line 59, in wrapped_app
    raise RuntimeError(msg) from exc
RuntimeError: Caught handled exception, but response already started.

System Info

Python 3.8.16
Gradio 4.12.0
requests 2.31.0
fastapi 0.108.0

System: Ubuntu
Browser: Chrome 120.0.6099.129, Safari 16.1
Proxy: Shadowsocks

Severity

Blocking usage of gradio

shimizust commented 4 months ago

I am also experiencing this issue after bumping gradio from 3.50.2 to 4.12.0.

I basically have a Gradio app deployed on a k8s cluster. Port-forwarding directly to the pod works as expected, but accessing it externally via emissary causes this same error.

shimizust commented 4 months ago

@abidlabs Do you have any ideas on what might be the issue and how to work around it, as it's blocking the major version upgrade for us? Is it related to the switch to using SSE by default in v4? Is there a way to disable it?

I think this may be an important issue as more "production" Gradio apps being served on internal infra try upgrading to v4.

freddyaboulton commented 4 months ago

Hi @shimizust ! I think the root_path parameter needs to be set when running behind a proxy. See this guide: https://www.gradio.app/guides/running-gradio-on-your-web-server-with-nginx#run-your-gradio-app-on-your-web-server

It uses nginx but I think it should apply to other proxies

shimizust commented 3 months ago

Thanks @freddyaboulton, although I think this may be a different issue. My app is running at the root of the domain already.

I think this has to do with session affinity and the use of SSE when you have the app running on multiple pods. Once I configured session affinity at the ingress layer (or if you are accessing within the cluster, you would need to configure session affinity at service layer), I was able to get past the initial "An Unexpected Error Occurred" on initial load of the app. However, I'm still starting to get these 404 Session Not Found errors while using the app:

image

These go away if I decrease the number of pods to a single pod, which isn't ideal. I'd like to be able to put multiple pods behind a load balancer. Not sure if anyone has any insights. Again, this wasn't an issue with gradio 3.x.

pseudotensor commented 3 months ago

@abidlabs @freddyaboulton Also see same issues when using gradio 4.17.0 on k8 even though not trying to access it directly, just across pods. 3.50.2 worked perfectly in exact same setup.

Probably will have to unfortunately revert again back to 3.50.2 (I've tried 4 times to upgrade :( ).

Note that we use nginx perfectly fine on 4.17.0, so it's not just a proxy issue.

ilyashusterman commented 3 months ago

im running with fastapi too , with gradio.queue , gradio mount app and its definetly problem with gradio logic for accessing between session id of routes, tried with following kubernetes yaml config , still same exception for session already started :

  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600 # 1 hour

tried also with gunicorn and still no success and same exception:

./.venv/bin/gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 127.0.0.1:7860

maybe there is a problem with my configurations, can you guys check also?

pseudotensor commented 3 months ago

@abidlabs Note that this is a regression, 3.50.2 worked fine. Should be fixed I'd hope. I'm unable to upgrade to gradio 4 due to his, event though all non-networking things are wonderful with gradio4.

abidlabs commented 3 months ago

Looking into this!

pseudotensor commented 3 months ago

Collecting info and repro-ness.

When things are bad on k8, on 4.17.0 (before nginx issue), this is one failure:

I have no name!@h2ogpte-core-7899c6665-n4nn8:/app$ python3                          
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gradio_client import Client
>>> x = Client('http://h2ogpt-web/')
>>> y = x.predict(api_name='/system_hash')
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/client.py", line 192, in stream_messages
    event_id = resp["event_id"]
KeyError: 'event_id'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/client.py", line 1590, in result
    return super().result(timeout=timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/client.py", line 973, in _inner
    predictions = _predict(*data)
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/client.py", line 1008, in _predict
    result = utils.synchronize_async(
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/utils.py", line 870, in synchronize_async
    return fsspec.asyn.sync(fsspec.asyn.get_loop(), func, *args, **kwargs)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/usr/local/lib/python3.10/dist-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/client.py", line 1206, in _sse_fn_v1_v2
    return await utils.get_pred_from_sse_v1_v2(
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/utils.py", line 414, in get_pred_from_sse_v1_v2
    raise exception
  File "/usr/local/lib/python3.10/dist-packages/gradio_client/utils.py", line 524, in stream_sse_v1_v2
    raise CancelledError()
concurrent.futures._base.CancelledError
>>>

Another one is:

INFO:     10.255.6.190:60640 - "GET /queue/data?session_hash=a3c4ade8-099a-4837-ba21-da4358400402 HTTP/1.1" 200 OK
Exception in ASGI application
Traceback (most recent call last):
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await response(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/responses.py", line 257, in __call__
    async with anyio.create_task_group() as task_group:
  File "/h2ogpt_conda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
    raise exceptions[0]
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/responses.py", line 260, in wrap
    await func()
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/responses.py", line 249, in stream_response
    async for chunk in self.body_iterator:
  File "/h2ogpt_conda/lib/python3.10/site-packages/gradio/routes.py", line 663, in sse_stream
    raise e
  File "/h2ogpt_conda/lib/python3.10/site-packages/gradio/routes.py", line 604, in sse_stream
    raise HTTPException(
fastapi.exceptions.HTTPException: 404: Session not found.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/h2ogpt_conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/h2ogpt_conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/h2ogpt_conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 68, in wrapped_app
    raise RuntimeError(msg) from exc
RuntimeError: Caught handled exception, but response already started.
abidlabs commented 3 months ago

Could someone here please try installing this version of gradio and seeing if the issue is resolved?

pip install https://gradio-builds.s3.amazonaws.com/9b8810ff9af4d9a50032752af09cefcf2ef7a7ac/gradio-4.18.0-py3-none-any.whl
oobabooga commented 3 months ago

I am also getting these "404: Session not found." errors all the time after upgrading to gradio==4.19. This is the error message, very similar to the one above by @pseudotensor:

Exception in ASGI application
Traceback (most recent call last):
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/starlette/routing.py", line 77, in app 
    await response(scope, receive, send)
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/starlette/responses.py", line 257, in __call__
    async with anyio.create_task_group() as task_group:
  File "/home/me/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
    raise exceptions[0]
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/starlette/responses.py", line 260, in wrap
    await func()
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/starlette/responses.py", line 249, in stream_response
    async for chunk in self.body_iterator:
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/routes.py", line 665, in sse_stream
    raise e
  File "/home/me/.miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/routes.py", line 605, in sse_stream
    raise HTTPException(
fastapi.exceptions.HTTPException: 404: Session not found.

Could someone here please try installing this version of gradio and seeing if the issue is resolved?

pip install https://gradio-builds.s3.amazonaws.com/9b8810ff9af4d9a50032752af09cefcf2ef7a7ac/gradio-4.18.0-py3-none-any.whl

I have tried this wheel, and it seems to make the exception go away, but I still get several "Connection errored out." popups in the UI if I refresh it a few times with F5/Ctrl+F5. This is the offending line in index.js:

print

I don't have a simple example to reproduce the issue, but it happens all the time in the dev branch of my project, which now uses Gradio 4.19.

abidlabs commented 3 months ago

Thanks @oobabooga are you running behind a proxy as well?

Also when you say that you see this error after upgrading to 4.19, what version were you upgrading from? I.e what was the latest version that did not have this issue for you?

oobabooga commented 3 months ago

I am not using a proxy, just launching the server with server_name='0.0.0.0' and accessing the UI from another computer in the same local network. The error above doesn't happen every time I open the UI, but if I refresh the page a few times, it always ends up happening after a few attempts.

For clarity, several error popups with the message "404: Session not found." appear in the UI when the stacktrace that I posted happens.

The last gradio version I used was 3.50.2, and that issue never happened there.

print

oobabooga commented 3 months ago

I found that if I comment my interface.load events, the error stops happening. That's lines 149 to 154 here:

https://github.com/oobabooga/text-generation-webui/blob/7123ac3f773baa120d644e7b8ab10027758d1813/server.py#L149

abidlabs commented 3 months ago

Ok I think I know why its happening on k8, but not sure why its happening for you @oobabooga. It seems like that's a separate issue. If you are able to put together a more self-contained repro, that would be veryy appreciated.

oobabooga commented 3 months ago

I have been trying to come up a minimal example to reproduce the issue, but it has been difficult. I did find that the same error has been happening in other repositories:

https://github.com/daswer123/xtts-finetune-webui/issues/7

https://github.com/invoke-ai/invoke-training/issues/92

It may be the case that the problem in this issue is not the use of a proxy, but that fact that events behind a proxy take longer to run, somehow triggering the error.

aliabid94 commented 3 months ago

Hmm I assume it's some sort of race condition. I have a simple solution in mind, lemme try it real quick

aliabid94 commented 3 months ago

Can you try using the gradio version in this PR: https://github.com/gradio-app/gradio/pull/7469 - install via pip install https://gradio-builds.s3.amazonaws.com/e5e7e12ccb0df856111857b2153e7bcad3b478bf/gradio-4.19.1-py3-none-any.whl

oobabooga commented 3 months ago

@aliabid94 I have tried it and it seems to get rid of the "404: Session not found." errors, but I still get the "Connection errored out" errors, this time without any stacktrace in the terminal. It happens inconsistently, about 20% of times when I launch my app, and it is associated to the interface.load events here.

print

aliabid94 commented 3 months ago

argh ok looking into this again

arian81 commented 3 months ago

@aliabid94 Is there an update on this ? I keep getting this issue when using the mic to record audio.

aliabid94 commented 3 months ago

@arian81 can you share your set up as well? Trying to create a reproducible setup for this bug

arian81 commented 3 months ago

@arian81 can you share your set up as well? Trying to create a reproducible setup for this bug

I'm sorry, I usually can provide code snippets that cause issues but this time since I haven't pin pointed where the problem is I can't share the whole project since it's an internal company project.

aliabid94 commented 3 months ago

Ok, since I'm having a hard time reproducing this, can you guys please install the version of gradio from this PR, to which I've added some console logs: pip install https://gradio-builds.s3.amazonaws.com/3fc7e92f6bf9b69f03ac7ae0acba2b2d7c4c1960/gradio-4.19.1-py3-none-any.whl cc @lykeven @shimizust @arian81 @pseudotensor @ilyashusterman

Try running it, and when you get errors, please post your browser console logs.

Also @oobabooga if you replace the lambda: None with just None in your .load listeners, Gradio will avoid the roundtrip to the backend and this would probably solve your problem on its own.

arian81 commented 3 months ago

@aliabid94 I tried with the version you mentioned but I didn't get any errors logged, however the issue still happens.

arian81 commented 3 months ago

@aliabid94 @abidlabs Hey is there any update on this ? This issue has broken speech to text in my app which is one of the most important features.

abidlabs commented 3 months ago

Hi @arian81 there are different variants of this issue and I'm looking into them. Just to confirm, you're seeing this issue: https://github.com/gradio-app/gradio/issues/7531? If so, can you respond on that issue with more details of the bug that you're seeing (+ repro would be helpful) so that we can debug?

arian81 commented 3 months ago

Hi @arian81 there are different variants of this issue and I'm looking into them. Just to confirm, you're seeing this issue: #7531? If so, can you respond on that issue with more details of the bug that you're seeing (+ repro would be helpful) so that we can debug?

Well my issue is with the microphone component, so whenever the user presses the record audio button, i get the errored out modal. I only get this issue on the deployed version on Google's cloud run. This is an internal project so there's no publicly accessible I can share. I tried the version of gradio @aliabid94 mentioned but it didn't come up with any new logs. I don't know how helpful it is but on client side, error 6 from this function is being executed.

function open_stream() {
        stream_open = true;
        let params = new URLSearchParams({
          session_hash
        }).toString();
        let url = new URL(`${config.root}/queue/data?${params}`);
        event_stream = EventSource_factory(url);
        event_stream.onmessage = async function(event) {
          let _data = JSON.parse(event.data);
          const event_id = _data.event_id;
          if (!event_id) {
            await Promise.all(
              Object.keys(event_callbacks).map(
                (event_id2) => event_callbacks[event_id2](_data)
              )
            );
          } else if (event_callbacks[event_id]) {
            if (_data.msg === "process_completed") {
              unclosed_events.delete(event_id);
              if (unclosed_events.size === 0) {
                close_stream();
              }
            }
            let fn2 = event_callbacks[event_id];
            window.setTimeout(fn2, 0, _data);
          } else {
            if (!pending_stream_messages[event_id]) {
              pending_stream_messages[event_id] = [];
            }
            pending_stream_messages[event_id].push(_data);
          }
        };
        event_stream.onerror = async function(event) {
          console.log("error 6", event);
          await Promise.all(
            Object.keys(event_callbacks).map(
              (event_id) => event_callbacks[event_id]({
                msg: "unexpected_error",
                message: BROKEN_CONNECTION_MSG
              })
            )
          );
          close_stream();
        };
      }
arian81 commented 3 months ago

@abidlabs I rechecked that and this error logging is specifically added by @aliabid94 in #7469. So he should be able to get some insight into my issue. I'm willing to work directly with him on discord or whatever works for you guys to get this resolved as fast as possible. I'm getting a lot of pressure from the users to get this issue fixed as soon as possible.

abidlabs commented 3 months ago

What do you see in the console logs? Do you see errors similar to https://github.com/gradio-app/gradio/issues/7531

arian81 commented 3 months ago

What do you see in the console logs? Do you see errors similar to #7531

No mine is a bit different image

arian81 commented 3 months ago

@abidlabs I checked all the logs of the server and the load balancer which handles the proxy, there's no errors and every request is handled with a 200. Something is breaking on the Gradio side of stuff on client side. Hopefully the fact that it's "error 6" means something to @aliabid94 and can help fix this issue.

pseudotensor commented 3 months ago

FYI Still have this issue for h2oGPT for latest gradio for k8 issue, have to revert to gradio 3 for k8.

abidlabs commented 3 months ago

Sorry about these issues folks. Looking into them, but yes as a temporary measure I would recommend downgrading until we've figured them out.

arian81 commented 3 months ago

Sorry about these issues folks. Looking into them, but yes as a temporary measure I would recommend downgrading until we've figured them out.

Downgrading is not much of an option for me since I've been using a lot of the v4 features. I'm hoping you guys can find the root of this issue soon.

abidlabs commented 3 months ago

@arian81 if I'm not mistaken, the issue you're seeing should not be present on 4.16.0. Can you confirm?

arian81 commented 3 months ago

@arian81 if I'm not mistaken, the issue you're seeing should not be present on 4.16.0. Can you confirm?

Yes I downgraded to 4.16.0 and the issue is gone.

abidlabs commented 3 months ago

Yes I downgraded to 4.16.0 and the issue is gone.

Ok yeah, it has to do with some recent changes we made. We're working on fixing them but hopefully this should help alleviate the pressure from users in the meantime!

pseudotensor commented 3 months ago

But just FYI, the k8 issue we see is still there for 4.16.0. So it's not entirely solved by 4.16.0

abidlabs commented 3 months ago

Yes I know k8 is its own beast, need to set up a simple, reproducible environment to figure that out (lmk @pseudotensor if you have any recommendations on the simplest set up -- bonus points if we can somehow integrate a test into our python test suite)

pseudotensor commented 3 months ago

@abidlabs So far we have tried to repro and cannot, but we will try harder.

@achraf-mer -- can you try your minikube setup and see if it's possible to find a repro? Or @EshamAaqib your dev box setup figure out why it's different than playground.

samiragmeta commented 3 months ago

I see this fairly consistently with 4.19 and 4.16. Here is what I see:

If going directly to gradio app without reverse-proxy: No problem. If going directly to gradio app with reverse-proxy: No problem. If going to gradio app with reverse proxy and an internal load-balancer: 404: Session not found

The timing varies a bit but it fails every time. All other logs look clean.

abidlabs commented 3 months ago

Can you describe this internal load balancer a little more so that we can reproduce the problem?

samiragmeta commented 3 months ago

I am using the Google load balancer in INTERNAL mode. 

nginx is the reverse proxy. It talks to the LB frontend. The LB backend talks to the instances in the subnet. In this case I have a limit of 2 instances and we are using 1. 

BTW, I downgraded to 3.50.2, and the 404 errors go away. However, several times there is no response at all. I am sure this has something to do with HTTP response timeouts but am still debugging.

---- On Thu, 29 Feb 2024 15:33:48 -0800 Abubakar Abid @.***> wrote ---

Can you describe this internal load balancer a little more so that we can reproduce the problem?

— Reply to this email directly, https://github.com/gradio-app/gradio/issues/6920#issuecomment-1972149171, or https://github.com/notifications/unsubscribe-auth/BA6EKN4C3DQF6DOAI2BSZQTYV65FZAVCNFSM6AAAAABBJX3DASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGE2DSMJXGE. You are receiving this because you commented.

iRonJ commented 3 months ago

I'm seeing a similar error when i'm using FastAPI to host the app:

  File "/Users/ron/.pyenv/versions/miniconda3-3.11-23.5.2-0/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ron/.pyenv/versions/miniconda3-3.11-23.5.2-0/lib/python3.11/site-packages/gradio/routes.py", line 686, in queue_join
    success, event_id = await blocks._queue.push(body, request, username)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ron/.pyenv/versions/miniconda3-3.11-23.5.2-0/lib/python3.11/site-packages/gradio/queueing.py", line 217, in push
    event_queue = self.event_queue_per_concurrency_id[event.concurrency_id]
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: '5364209504'

Launch code looks like:

gradioBlocksApp.enable_queue=False
gradioBlocksApp.share=False
gradioBlocksApp.show_error=True
gradioBlocksApp.favicon_path=favPath

app = FastAPI()

app.add_middleware(GZipMiddleware, minimum_size=1000)

gradio_app = routes.App.create_app(gradioBlocksApp)
app.mount("/", gradio_app)

Error triggers when i click a button on the Blocks that triggers a function.

Using Gradio 4.12 (downgrading seems to fix the issue)

edson-arcaea commented 3 months ago

I've been having issues with this as well.

Downgrading to 3.50.2 definitely helps but you may have to adjust a ton of other libraries depending on how complex your work is. I'm wondering if someone is working on it or the advise is essentially to refraim from upgrading to 4.17.0 for now?

abidlabs commented 3 months ago

I'm working on it @edson-arcaea but progress has been a little slow since this error has been hard to reproduce. Anything you can share that will help us reproduce the issue will be much appreciated

edsna commented 3 months ago

Hi @abidlabs thank you so much for the prompt response.

It's me again, I had to change machines but I've sth that may help reproduce the issue, please follow the steps below:

  1. I created a simple app using gr.TabbedInterfaceintroduced in gradio 4.0 I believe. here's how the app looks like:
import gradio as gr

# Function that echoes the text input
def echo_text(text):
    return text

# Function that displays the uploaded image
def show_image(image):
    return image

# Create the text interface
text_interface = gr.Interface(fn=echo_text, inputs="text", outputs="text")

# Create the image interface
image_interface = gr.Interface(fn=show_image, inputs="image", outputs="image")

# Combine both interfaces into a tabbed interface
tabbed_interface = gr.TabbedInterface([text_interface, image_interface], ["Text Echo", "Image Display"])

# Launch the app
tabbed_interface.launch(server_name="0.0.0.0", server_port=7860)
  1. I ran this locally in an isolated env withgradio==4.19.2 gradio_client==0.10.1 using python app.py and it loaded just fine, see image below, I've also attached the gradio==4.19.2ANDgradio_client==0.10.1.txt file used to run this locally: gradio==4.19.2ANDgradio_client==0.10.1.txt local-run at gradio==4 19 2ANDgradio_client==0 10 1

  2. Next, containerize the app using a Docker with a simple image such as this:

    
    # Use an official Python runtime as a parent image
    FROM python:3.9-slim

Set the working directory in the container

WORKDIR /usr/src/app

Copy the current directory contents into the container at /usr/src/app

COPY . .

Install any needed packages specified in requirements.txt

RUN pip install --no-cache-dir -r requirements.txt

Make port 7860 available to the world outside this container

EXPOSE 7860

Define environment variable

ENV NAME World

Run app.py when the container launches

CMD ["python", "app.py"]

**You can run the docker image to test as well and it will work. Then tag and push that docker image to whichever platform you'd like, - I used ECR.** 

4. Next create a simple deployment, service and ingress file to deploy the app in a k8s cluster:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: whatever-you'd-like
  namespace: whatever-you'd-like
  labels:
    app: whatever-you'd-like
spec:
  replicas: 1
  selector:
    matchLabels:
      app: whatever-you'd-like
  template:
    metadata:
      labels:
        app: whatever-you'd-like
    spec:
      containers:
      - name: whatever-you'd-like
        image: Please use the image deployed and pushed to some containers registry as explained earlier.
        ports:
        - containerPort: 7860
-----
apiVersion: v1
kind: Service
metadata:
  name: whatever-you'd-like
  namespace: whatever-you'd-like
  labels:
      app: whatever-you'd-like
spec:
  selector:
    app: whatever-you'd-like
  ports:
    - protocol: TCP
      port: 80
      targetPort: 7860

------
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: whatever-you'd-like
  namespace: whatever-you'd-like
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: test.exampledomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: whatever-you'd-like
            port:
              number: 80

Make sure to configure DNS for the domain you use eg: test.exampledomain.com

  1. After deploying these, the app should be accessible through test.exampledomain.com and that's when you get the error message: Screenshot 2024-03-01 at 2 18 32 PM

  2. The temporary solution at the moment is to downgrade gradio and gradio_client to this:

    gradio==3.50.2
    gradio_client==0.6.1

You can do that directly in the requirements.txt file, build the image again, push it and update the deployment.yml file with the new image URI/... whatever you use.

Everything will work well after that. Screenshot 2024-03-01 at 2 29 02 PM

I have to say that in different stages of the deployment you find the different errors many mentioned above - this is already long so I can't reproduce them all step by step but I've attached screenshots of some of them below. @abidlabs I hope this helps you get a picture of what might have changed to cause the problem.

Screenshot 2024-03-01 at 2 37 35 PM

mgirard772 commented 2 months ago

Subscribing to this issue. Hoping a fix can be issued soon since the newest version without this issue has known security vulnerabilities.

arian81 commented 2 months ago

@abidlabs Is there any updates on fixing this issue on versions after 4.16 ?