rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

Clarify and enforce sorting semantics #43

Open sekruse opened 7 years ago

sekruse commented 7 years ago

While Rheem has a sorting operator, we might need to be more specific in what Rheem's sorting semantics should actually be and we might need to be more rigorous in enforcing them.

  1. What's the scope of a sortation, i.e., should we guarantee the sortation order to be maintained by, say, a map operation? Or even a reduceByKey call?
  2. Should sortation be a feature that is leveraged by the optimizer as an interesting property? For instance, aggregating data quanta that are already sorted by the aggregation key can leverage the sortation.
  3. We need to enforce the maintenance of the sorting order according to what we specify in (1).