rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Create a utility to infer cost functions for operators from collected execution data #24

Closed luckyasser closed 7 years ago

luckyasser commented 7 years ago

From @sekruse on September 6, 2016 18:19

So far, we have tried to automatically infer cost functions for ExecutionOperators by measuring them in artificial, isolated profiling environments. The results were okay but still leave space for improvement.

A second approach to the same problem is to profile jobs as they are executed. This does not cost extra executions and does not involve any assumptions on data distributions etc. The collected metadata are, however, not directly giving away cost functions: Due to lazy execution of most frameworks, whole blocks of ExecutionOperators are executed at once.

To infer appropriate cost functions, we can model the whole issue as a learning problem: each cost function has several parameters and we try to find such a parameter assignment that minimizes some loss of the resulting time estimates on measured execution blocks w.r.t. the actually measured execution time.

Copied from original issue: daqcri/rheem#24