gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
33.97k stars 2.58k forks source link

Queue upstream when loading apps via gr.Interface.load() #1316

Closed apolinario closed 2 years ago

apolinario commented 2 years ago

If app A uses gr.Interface.load() to load an app B that contains a your_app.launch(enable_queue=True), the queue does not get respected when the app B is executed from the app A. So if there are 3 app A users, and all trigger app B at the same time, app B runs 3x in parallel, regardless if enable_queue was set to True on app B.

This implies in two things:

  1. Any person can bypass the app B queue by using app A
  2. The host machine of app B may OOM if multiple users are running it from app A, as no queue is in place

This can currently be mitigated by adding a enable_queue=True to app A however this two shortcomings:

  1. app A developer has to have the goodwill to include the queuing function there (including using gr.Interface.load() privately via localhost - which can be an easy way to bypass any queue), if they don't, app B gets resource drained outside of their control
  2. This makes app A inefficient. Suppose app A not only loads app B but also loads apps C and D with gr.Interface.load(). If enable_queue is on app A, now when you call apps B, C, D from it, they get into the same queue, even though each could have very different internal queues (this happens with MindsEye Lite)

Suggested solution:

If that gets implemented make sure to de-register from app B's queue the users that leave/give up on app A too, otherwise an infinite queue could arise

apolinario commented 2 years ago

@abidlabs, this may be important to include when bringing the ability to load 3.x spaces with gr.Interface.load()

abidlabs commented 2 years ago

Thanks for the catch and the heads up @apolinario! Will implement appropriately

omerXfaruq commented 2 years ago

We need to check if the loaded space is a queued space and create a ws connection from Backend.

abidlabs commented 2 years ago

Hi @apolinario, we are planning on making queuing on by default everywhere (#2215) -- and it already has been on by default on Spaces for quite some time.

I'm wondering whether that makes this issue redundant as it basically is the second approach that you've mentioned (App A will have queuing on by default, whether or not the developer has the good will to do this). The other point about efficiency is still there, but I wonder whether that could be remedied in a different way -- for example by having separate queues for different events.

apolinario commented 2 years ago

Imo it doesn't make this issue redundant. I really think the ideal solution to this issue is passing the queue information downstream and sharing the queue between the main application and the loaded application. I don't know if that is possible but I feel otherwise this still makes gr.Interface.load() kind of unusable on high-load/large queue applications

The reason why making queuing default everywhere mitigates a bit but doesn't solve the issue, is that now the two queues (from app A and app B) are independent. So this means that even with two queues, the users from app a still skip the queue of app b, and also it can OOM app b as - in case they don't communicate - a request can come in parallel and consume more requests than the machine supports.

apolinario commented 2 years ago

So for example, if I want to load the Stable Diffusion Spaces into another application "Stable Diffusion Captioner" (SDC) that first runs Stable Diffusion and then an image captioning model.

If "Stable Diffusion Captioner" doesn't share the queue with the Stable Diffusion Spaces, even if it has a queue on its own turned on by default, the SDC user will still skip everyone in the original Stable Diffusion queue when they execute the job - which is not great. But imo the worst part, if Stable Diffusion is executing a job at that moment (which is likely), it would get the request from SDC in parallel, and then it would OOM and die.

The ideal scenario is that SDC when requests from Stable Diffusion Spaces gets their request as the last one in the original app queue.

abidlabs commented 2 years ago

Thanks for the detailed explanation @apolinario. That makes complete sense. The reason is that (in the example you provided) the SDC Space would be calling the /api/predict endpoint of SD instead of the /join/queue websocket connection. We'll fix this

apolinario commented 2 years ago

Awesome! Great that with the new queue design there is a /join/queue websocket! Does that also pass real time info about the queue? Because then the app that is loading the demo could show to the user a real time position of the queue, that would be dope!

abidlabs commented 2 years ago

Yes it does, so I think it should be possible in principle for the downstream app to know everything about the upstream queue