Closed mwtian closed 2 years ago
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Overview
Python coroutines are Python functions that can be suspended and resumed. They are built from Python generators, including those declared with the
async
/await
syntax. With https://github.com/llllllllll/cloudpickle-generators for generator serialization, and Ray ObjectRef for distributed future, we can build a runtime that can execute Python coroutines across Ray nodes, potentially with checkpointing.In the diagram above, Ray would run the coroutine as follow: 1) The coroutine
f()
first yields atawait load_from_s3()
. Assuming this is an async function not using Ray, the output will be local so there is no right semantic to serialize and deserialize the coroutine here. Ray runtime steps the coroutine on the localasyncio
event loop. 2) After above, coroutinef()
next yields atawait classifier_actor.remote(images)
. Assuming this is a Ray remote method call, the output ObjectRef can be used anywhere in the Ray cluster. Serializing the coroutine, sending the serialized data to a different node, checkpointing, and deserializing the coroutine are possible. 3) Then coroutinef()
proceeds to return its result to Ray. And Ray handles the result by returning to the coroutine caller or persisting the result.Potential Use Cases
Specifying Workflow in Python
Workflows can be implemented in a Python coroutine, instead of in a special API / DSL. A hypothetical trip booking workflow can be:
book_car
,book_hotel
andbook_flight
are Ray tasks defined with@ray.remote
. Ray workflow runtime can turn thebook_trip()
coroutine above into a persisted workflow, by checkpointing the workflow at eachawait
, so the workflow avoids duplicating successful tasks and retries failed tasks.There are more complexities if we want to run tasks in parallel like
asyncio.gather()
. We may have to checkpoint at each.remote()
call. This is being investigated.Optimizing Request Processing Flow
Suppose there are Ray actors for specific tasks, e.g.
english_speech_to_text
,search
andenglish_text_to_speech
, and we want to combine them for a voice search feature (request is processed viaenglish_speech_to_text
->search
->english_text_to_speech
). Usually we have to make each Ray actor aware of the next Ray actor to continue request processing. We may need to add voice search specific logic into the Ray actors. This breaks encapsulation. Another alternative is to gather the result from each actor in a request handler, which may increase latency and cost in data transfer. Instead, we can describe how the request flows through the actors with a Python coroutine:The coroutine can be developed and tested locally. It can also be executed with Ray from the request handler. Ray can suspend the coroutine at each remote call (e.g.
english_speech_to_text.run.remote(question_speech)
), serialize and forward the coroutine to the node which will produce the result (e.g. the node running theenglish_speech_to_text
actor). This makes the actors' code cleaner, and reduces data movement to the same level as forwarding to specific actor with reasonable optimizations.Status
Prototype: https://github.com/ray-project/ray/pull/21783
Know Issues
asyncio.gather()
oranyio
, is not fully fleshed out.del
variables that are not used further in the coroutine can be a workaround. We may be able to automatically avoid serializing variables not used further in the coroutine too.Next Steps
First we want to gather feedback from the community.
Please let us know what you think!
cc @ericl @richardliaw @iycheng @simon-mo