rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Enhance configurability of estimators #11

Closed luckyasser closed 7 years ago

luckyasser commented 7 years ago

From @sekruse on July 7, 2016 12:11

Not all Operators currently support proper configurability of estimators for UDFs. Moreover, the data distribution is unknown to Rheem, while the user might profit useful hints here and there. This pertains to two metrics:

  1. Cardinality estimation
    • influenced by UDFs (at least): FilterOperator, FlatMapOperator
    • influenced by data distribution (at least): DistinctOperator, FilterOperator, FlatMapOperator (via broadcasts), MaterializedGroupByOperator, IntersectOperator, JoinOperator, ReduceByOperator
  2. Load estimation
    • basically, every ExecutionOperator with a UDF might want to consider the load of its UDF - in particular for heavy-weight UDFs

These observations yield the following tasks:

Copied from original issue: daqcri/rheem#1