gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
32.15k stars 2.4k forks source link

Add a Python client to make it easy to connect to hosted Gradio apps & respect the queue #2516

Closed freddyaboulton closed 1 year ago

freddyaboulton commented 1 year ago

As we make the queue the primary way of connecting with gradio apps programmatically, we should offer developers a friendly client that lets them connect to queue/join and get the websocket messages.

This will make it easy for developers to treat gradio/spaces as a backend in their apps without worrying about the low-level details of our queue implementation. This will also make it easy for us to change our queue implementation without breaking our user's apps.

Describe the solution you'd like
Add a python and javascript client for connecting with queue/join and getting messages. The View Api page should change to show how to use the client.

FYI @pngwn

abidlabs commented 1 year ago

On the Python side, we already have gr.Interface.load(), which respects the queue. However, it only supports Spaces right now. Would be good to add support for arbitrary URLs

abidlabs commented 1 year ago

Revisiting this -- I think we should develop a separate Python library / client for connecting to hosted Gradio Spaces. The main advantage would be that if someone wants to use a Space as an API, they would not need to have gradio (along with its many dependencies) as a dependency, instead they could have gradio-client, which would be just a wrapper over network calls, as a dependency.

Here are the functions that the v1 Python client should support:

import gradio_client as grc

app = grc.load("space/name")  # by default, loads from a Space
app = grc.load("space/name", access_token="...":)  # for private Spaces

app.payload() # provides the expected API format in a programmatic way
print(app)  # prints the expected API format in a human-readable way

job = app.predict(arg1, arg2)  # submits the job to the queue
job.status  # PROCESSING or COMPLETED or ERROR

job.result()  # gets the result of the job if completed, otherwise returns `None`

Similarly, we can put together a gradio client in JS for JS apps

cc @aliabd @freddyaboulton @apolinario @osanseviero @yvrjsharma @pngwn for your thoughts

aliabid94 commented 1 year ago

makes sense to me

freddyaboulton commented 1 year ago

Agreed that implementing a client in a separate library is a good idea. I would like it if gradio_client was a dependency of gradio (and that we used it in .load) to make sure it doesn't get stale with the main codebase.

If we're going to make predict non-blocking, I think job should implement the concurrent.futures interface https://docs.python.org/3/library/concurrent.futures.html#future-objects to make it easy to run them asynchronously with as_completed.

I'm assuming payload and predict both take api_name as a parameter to handle apps with more than one event trigger?

freddyaboulton commented 1 year ago

Should we also have an authenticate method on app? The use case is that you know the username/pwd of a public space but not the token of the app author

pngwn commented 1 year ago

We need a final decision on the whole user/name vs subdomain.hf.space thing because I think the way we encourage users to reference hosted spaces should be consistent, whether they are embedding or using the API. Having a bunch of different ways to refer to the same thing is just confusing.

I really don't think the subdomain is better than the space name:

API looks fine to me, do we want a way to notify users of the status/ queue info? I guess this is less useful in python but something we'll want for the JS client.

cc @julien-c @gary149

pngwn commented 1 year ago

Some questions

app = grc.load("space/name")  # by default, loads from a Space
app = grc.load("space/name", access_token="...":)  # for private Spaces

Should this also work with any URL? Not just spaces but self hosted apps too? I don't see any reason why it shouldn't. Private URLs are more complex of course but the basic case could be supported imo.

app.payload() # provides the expected API format in a programmatic way
print(app)  # prints the expected API format in a human-readable way

job = app.predict(arg1, arg2)  # submits the job to the queue
job.status  # PROCESSING or COMPLETED or ERROR

job.result()  # gets the result of the job if completed, otherwise returns `None`

Graiduo apps don't always have one predict end point so to speak (well i guess they do but with different a fn_index but we will want to abstract that). Should this be something like this instead:

job = app.predict("endpoint_name")

Likewise, on the frontend we ensure you can't send another message to the queue when one is already pending but I don't think we should add that limitation here (or at least we should make it configurable), so how do we:

  1. Keep track of which sent message respond to which received message. This applies to status updates too if you send two messages one after the other, they will have different status information that will be reported separately. Should the client attach an id to each message that should also be returned by the server with any status updates + process completed messages?
  2. How do we notify users which updates respond to which requests? Would it be something like this:
job = app.predict("endpoint_name")
job2 = app.predict("endpoint_name")

Where it is bound to the return value of the specific 'predict' method

or does the client have some kind of internal state that captures the results and only clears them when they are accessed (or something):

job = app.predict("endpoint_name") # returns an id of 1
job2 = app.predict("endpoint_name") # returns an id of 2

app.job # or app.get("job") whatever is pythonic
# this would contain information about that job result or status

One thing that is missing from this though is the ability to be notified of the changes. I assume the expectation here is that the user will poll the client until they have a response? I'm not sure that is ideal. Or is this what @freddyaboulton was talking about?

I had a quick look at Futures and it is good for waiting for requests, like Promise in js but one thing it is not good at is notifying your of status updates (just Like Promises in js). You can wait for the Promise or Future, or execute some callback when it is complete but there isn't a good way with those interface to be notified of in between states.

Would it be better to extend those primitives but add the ability to pass in a callback when the status updates? I have no idea what that might look like in Python but in JS it might be something like:

const job = await app.predict(status_callback)
// this would return a 

Or the callback could be applied at the the client creation point (js again):

const app = client('space/name', progress_callback)

But then you need to map those progress_callbacks to a specific request in the callback function.

I Wonder if this would all just be better implemented wrapped up in some kind of reactive construct like an observable or a pubsub interface:

const app = client(...)

app.subscribe(subscription_callback)
.//or
app.subscribe(endpoint_you_care_about, callback_function)

app.predict(endpoint_you_care_about) -> triggers the callback as it updates for any reason
app.predict(end_you_dont~_care_about) -> doesn't trigger anything

In this way you only subscribe to updates for a specific endpoint (basically topics in classic pubsub) so it is scoped. You could pass the same client around to different parts of your app and they could subscribe to the endpoints they care about. But now you are back to dealing with possible out of order messages that have been sent to the same endpoints and will be returned in who knows what order.

Brain dump but thinking about this in some detail atm as I refactor the frontend API and would be good to get some thoughts.

abidlabs commented 1 year ago

Should this also work with any URL? Not just spaces but self hosted apps too? I don't see any reason why it shouldn't. Private URLs are more complex of course but the basic case could be supported imo.

Yeah we could do exactly the same thing as the <embed> where you can do:

grc.load(space="space/name")

or

grc.load(src="any_url")

Graiduo apps don't always have one predict end point so to speak (well i guess they do but with different a fn_index but we will want to abstract that). Should this be something like this instead:

Yup @freddyaboulton brought this point up as well. The predict() will take in an api_name parameter to allow it to connect to specific functions.

One thing that is missing from this though is the ability to be notified of the changes. I assume the expectation here is that the user will poll the client until they have a response? I'm not sure that is ideal. Or is this what @freddyaboulton was talking about?

I really like the idea of adding callbacks. The subscription model I haven't seen very commonly in Python libraries. I think the more common construct is to simply accept a callback function or list of functions as a parameter. Not too opinionated on it, but the complete signature could be something like this:

app.predict(args=[arg1, arg2], api_name="endpoint_name",  callbacks=[functions...])
abidlabs commented 1 year ago

Likewise, on the frontend we ensure you can't send another message to the queue when one is already pending but I don't think we should add that limitation here (or at least we should make it configurable), so how do we:

Keep track of which sent message respond to which received message. This applies to status updates too if you send two messages one after the other, they will have different status information that will be reported separately. Should the client attach an id to each message that should also be returned by the server with any status updates + process completed messages?

One way to implement this (without any changes to the gradio library -- we can't have changes if we want the client to support existing Spaces) is that anytime someone calls app.predict(), a background thread (or async process) opens a websocket connection to the Space and starts communicating with the Space until the prediction is complete.

This thread is associated with the job instance, so when the thread receives the prediction back from the Space, it assigns the results to the job instance. So running job.result() will get you the results for that particular input.

How do we notify users which updates respond to which requests? Would it be something like this:

job = app.predict("endpoint_name") job2 = app.predict("endpoint_name")

Exactly

pngwn commented 1 year ago

One way to implement this (without any changes to the gradio library -- we can't have changes if we want the client to support existing Spaces) is that anytime someone calls app.predict(), a background thread (or async process) opens a websocket connection to the Space and starts communicating with the Space until the prediction is complete.

We can't do that in the JS client at least, the overhead of all of the web socket connections, not to mention the start up time of establishing it will make the client very slow.

abidlabs commented 1 year ago

Hmm good point there might be a better way using the existing session_hash for this, as an identifier. Will think more about this.

pngwn commented 1 year ago

Updates issue to just be python, have created a new issue for the JS client since the work will be completed separately.

3310

JuvenileQ commented 1 year ago
  • [x] I have searched to see if a similar issue already exists.

As we make the queue the primary way of connecting with gradio apps programmatically, we should offer developers a friendly client that lets them connect to queue/join and get the websocket messages.

This will make it easy for developers to treat gradio/spaces as a backend in their apps without worrying about the low-level details of our queue implementation. This will also make it easy for us to change our queue implementation without breaking our user's apps.

Describe the solution you'd like Add a python and javascript client for connecting with queue/join and getting messages. The View Api page should change to show how to use the client.

FYI @pngwn

I am currently using React to develop the front-end. May I ask where to find the queue/join interface document for ws,