rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Add sampling to the APIs #22

Closed luckyasser closed 7 years ago

luckyasser commented 7 years ago

From @sekruse on August 29, 2016 11:1

We have the SampleOperator in the basic plugin, but it's not reflected in the API. It should be added. Also, it would be nice if there was no need to make specifying the sampling method optional.

Copied from original issue: daqcri/rheem#20

luckyasser commented 7 years ago

From @sekruse on August 30, 2016 8:51

When SampleOperators lack the dataset size, they materialize the dataset in order to count it. While this is logically sound, it tricks the statistics collection. Thus, it would be better if the SampleOperators exposed that behavior so that Rheem can better keep track of what is happening.

luckyasser commented 7 years ago

From @sekruse on August 30, 2016 13:42

The handling of eager execution and channel evaluation will be postponed.