ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.19k stars 5.48k forks source link

[data] Unify PhysicalOperator implementations #37630

Open raulchen opened 1 year ago

raulchen commented 1 year ago

Today, a lot of common features are implemented separately in each PhysicalOperator subclass. This makes some features missed for certain operator, or some features behave inconsistently.

A non-comprehensive list of such features includes:

We should refactor it with a more unified framework.

raulchen commented 12 months ago

The high-level idea is to abstract the above functionalities regarding to op execution in a new class OpExecutor. This class (along with its helper classes) will include most of the code that is currently in PhysicalOperator and OpState. OpState will be removed. PhysicalOperator will only need to care "how to handle an input RefBundle". The logic of handling an input RefBundle can be categorized in 2 types:

Other notes: