Some ExecutionOperators, such as the JavaJoinOperator and the GraphChiPageRankOperator do not do their work in a contiguous execution, but rather perform multiple partial executions. The idea is to reflect this in the cost model:
[x] Allow for multiple partial cost functions (LoadProfileEstimator) for each operator. This allows to describe the different executions individually.
[x] Give more detailed feedback on the different parts being executed. Together with the above point, this should allow the GeneticOptimizerApp to learn cost functions more efficiently. After all, the "garbage in, garbage out principle holds here, so that we must model the captured execution data and execution model as accurately as possible.
[x] Initialize the type of learned cost function from patterns. This should allow to set up the GeneticOptimizerApp more easily.
[x] With above changes, it should be rather easy to capture the execution of UDFs and expose them to the GeneticOptimizerApp. That way it is possible to learn heavy-weight UDF cost functions.
From @sekruse on November 15, 2016 15:56
Some
ExecutionOperator
s, such as theJavaJoinOperator
and theGraphChiPageRankOperator
do not do their work in a contiguous execution, but rather perform multiple partial executions. The idea is to reflect this in the cost model:LoadProfileEstimator
) for each operator. This allows to describe the different executions individually.GeneticOptimizerApp
to learn cost functions more efficiently. After all, the "garbage in, garbage out principle holds here, so that we must model the captured execution data and execution model as accurately as possible.GeneticOptimizerApp
more easily.GeneticOptimizerApp
. That way it is possible to learn heavy-weight UDF cost functions.Copied from original issue: daqcri/rheem#35