So far, we have tried to automatically infer cost functions for ExecutionOperators by measuring them in artificial, isolated profiling environments. The results were okay but still leave space for improvement.
A second approach to the same problem is to profile jobs as they are executed. This does not cost extra executions and does not involve any assumptions on data distributions etc. The collected metadata are, however, not directly giving away cost functions: Due to lazy execution of most frameworks, whole blocks of ExecutionOperators are executed at once.
To infer appropriate cost functions, we can model the whole issue as a learning problem: each cost function has several parameters and we try to find such a parameter assignment that minimizes some loss of the resulting time estimates on measured execution blocks w.r.t. the actually measured execution time.
From @sekruse on September 6, 2016 18:19
So far, we have tried to automatically infer cost functions for
ExecutionOperator
s by measuring them in artificial, isolated profiling environments. The results were okay but still leave space for improvement.A second approach to the same problem is to profile jobs as they are executed. This does not cost extra executions and does not involve any assumptions on data distributions etc. The collected metadata are, however, not directly giving away cost functions: Due to lazy execution of most frameworks, whole blocks of
ExecutionOperator
s are executed at once.To infer appropriate cost functions, we can model the whole issue as a learning problem: each cost function has several parameters and we try to find such a parameter assignment that minimizes some loss of the resulting time estimates on measured execution blocks w.r.t. the actually measured execution time.
Copied from original issue: daqcri/rheem#24