ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

[Serve] Provide `async` versions of `serve.get_app_handle()` and `serve.get_deployment_handle()` #44782

Open JoshKarpel opened 4 months ago

JoshKarpel commented 4 months ago

Description

serve.get_app_handle() and serve.get_deployment_handle() and their underlying method ServeControllerClient.get_handle() allow users to dynamically get a handle to a Serve Deployment (either the ingress deployment of an app, or a specific deployment, depending on which API you use).

These methods involve either 1 or 2 network calls to the Serve Controller to gather information, but those calls are done synchronously (ray.get(...)), which makes them inefficient to use in asynchronous code such as a FastAPI Deployment acting as a dynamic ingress to other deployments. Providing async variants of these functions would be a useful feature for async callers.

I would be happy to make these changes, though I think I would need some guidance on naming conventions and whatnot :)

Use case

See discussion at https://ray-distributed.slack.com/archives/CNCKBBRJL/p1713194071772759 for more details about our use case, but TLDR we create handles dynamically at runtime and noticed it was blocking other requests in our FastAPI app.

zcin commented 4 months ago

@JoshKarpel Could you create a separate issue for the asynchronous versions of these APIs, and separate this issue out into optimizations that can be made to the synchronous versions of the APIs?

JoshKarpel commented 4 months ago

@JoshKarpel Could you create a separate issue for the asynchronous versions of these APIs, and separate this issue out into optimizations that can be made to the synchronous versions of the APIs?

Can do!