Not all Operators currently support proper configurability of estimators for UDFs. Moreover, the data distribution is unknown to Rheem, while the user might profit useful hints here and there. This pertains to two metrics:
Cardinality estimation
influenced by UDFs (at least): FilterOperator, FlatMapOperator
influenced by data distribution (at least): DistinctOperator, FilterOperator, FlatMapOperator (via broadcasts), MaterializedGroupByOperator, IntersectOperator, JoinOperator, ReduceByOperator
Load estimation
basically, every ExecutionOperator with a UDF might want to consider the load of its UDF - in particular for heavy-weight UDFs
These observations yield the following tasks:
[x] allow to configure the selectivity of UDFs for FilterOperators and FlatMapOperators
[x] allow to override the CardinalityEstimators for Operator instances
[x] allow to configure LoadProfileEstimators for UDFs
[x] take into account UDF LoadProfileEstimators in the respective ExecutionOperators' LoadProfileEstimators
From @sekruse on July 7, 2016 12:11
Not all
Operator
s currently support proper configurability of estimators for UDFs. Moreover, the data distribution is unknown to Rheem, while the user might profit useful hints here and there. This pertains to two metrics:FilterOperator
,FlatMapOperator
DistinctOperator
,FilterOperator
,FlatMapOperator
(via broadcasts),MaterializedGroupByOperator
,IntersectOperator
,JoinOperator
,ReduceByOperator
ExecutionOperator
with a UDF might want to consider the load of its UDF - in particular for heavy-weight UDFsThese observations yield the following tasks:
FilterOperator
s andFlatMapOperator
sCardinalityEstimator
s forOperator
instancesLoadProfileEstimator
s for UDFsLoadProfileEstimator
s in the respectiveExecutionOperator
s'LoadProfileEstimator
sCopied from original issue: daqcri/rheem#1