Closed jjyao closed 2 years ago
cc @pabloem
The expected number of outputs from ray_execute_bundle can be computed from:
@iasoon mentions we may be able to optimize away the 3 section since it may be duplicating data
FYI, I've also been referring to Stephanie's Ownership paper for a more comprehensive discussion of the Object Ownership model: https://www.usenix.org/system/files/nsdi21-wang.pdf.
Currently all the data (PCollections) is managed by PcollectionBufferManager actor and
ray.put()
into object store inray_execute_bundle
worker process. This approach has several implications:ray.put()
, in our case, the worker processes that executeray_execute_bundle
. This is not ideal for fault tolerance since the data is fate share with the owner. In our cases, the data fate share with all the worker processes. Instead, we want the driver program to be the owner of all the data. This means instead of doingray.put()
we should return data as part ofray_execute_bundle
return values so the driver program becomes the owner. More on the ownership design: https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview#heading=h.vjc9egi2q5aaray_execute_bundle
gets the input data references from PcollectionBufferManager and callray.get()
to fetch the actual data. This way ray scheduler cannot do locality aware scheduling ofray_execute_bundle
remote function since by examining the function arguments, it doesn't know what data it needs. Instead, we should pass data ObjRefs directly as arguments toray_execute_bundle
.At high level, it should look like: