Closed apolinario closed 2 years ago
@abidlabs, this may be important to include when bringing the ability to load 3.x spaces with gr.Interface.load()
Thanks for the catch and the heads up @apolinario! Will implement appropriately
We need to check if the loaded space is a queued space and create a ws connection from Backend.
Hi @apolinario, we are planning on making queuing on by default everywhere (#2215) -- and it already has been on by default on Spaces for quite some time.
I'm wondering whether that makes this issue redundant as it basically is the second approach that you've mentioned (App A will have queuing on by default, whether or not the developer has the good will to do this). The other point about efficiency is still there, but I wonder whether that could be remedied in a different way -- for example by having separate queues for different events.
Imo it doesn't make this issue redundant. I really think the ideal solution to this issue is passing the queue information downstream and sharing the queue between the main application and the loaded application. I don't know if that is possible but I feel otherwise this still makes gr.Interface.load()
kind of unusable on high-load/large queue applications
The reason why making queuing default everywhere mitigates a bit but doesn't solve the issue, is that now the two queues (from app A and app B) are independent. So this means that even with two queues, the users from app a
still skip the queue of app b
, and also it can OOM app b
as - in case they don't communicate - a request can come in parallel and consume more requests than the machine supports.
So for example, if I want to load the Stable Diffusion Spaces into another application "Stable Diffusion Captioner" (SDC) that first runs Stable Diffusion and then an image captioning model.
If "Stable Diffusion Captioner" doesn't share the queue with the Stable Diffusion Spaces, even if it has a queue on its own turned on by default, the SDC user will still skip everyone in the original Stable Diffusion queue when they execute the job - which is not great. But imo the worst part, if Stable Diffusion is executing a job at that moment (which is likely), it would get the request from SDC in parallel, and then it would OOM and die.
The ideal scenario is that SDC when requests from Stable Diffusion Spaces gets their request as the last one in the original app queue.
Thanks for the detailed explanation @apolinario. That makes complete sense. The reason is that (in the example you provided) the SDC Space would be calling the /api/predict
endpoint of SD instead of the /join/queue
websocket connection. We'll fix this
Awesome! Great that with the new queue design there is a /join/queue
websocket! Does that also pass real time info about the queue? Because then the app that is loading the demo could show to the user a real time position of the queue, that would be dope!
Yes it does, so I think it should be possible in principle for the downstream app to know everything about the upstream queue
If app A uses
gr.Interface.load()
to load an app B that contains ayour_app.launch(enable_queue=True)
, the queue does not get respected when the app B is executed from the app A. So if there are 3 app A users, and all trigger app B at the same time, app B runs 3x in parallel, regardless ifenable_queue
was set toTrue
on app B.This implies in two things:
This can currently be mitigated by adding a
enable_queue=True
to app A however this two shortcomings:gr.Interface.load()
privately via localhost - which can be an easy way to bypass any queue), if they don't, app B gets resource drained outside of their controlgr.Interface.load()
. Ifenable_queue
is on app A, now when you call apps B, C, D from it, they get into the same queue, even though each could have very different internal queues (this happens with MindsEye Lite)Suggested solution:
If that gets implemented make sure to de-register from app B's queue the users that leave/give up on app A too, otherwise an infinite queue could arise