While Rheem has a sorting operator, we might need to be more specific in what Rheem's sorting semantics should actually be and we might need to be more rigorous in enforcing them.
What's the scope of a sortation, i.e., should we guarantee the sortation order to be maintained by, say, a map operation? Or even a reduceByKey call?
Should sortation be a feature that is leveraged by the optimizer as an interesting property? For instance, aggregating data quanta that are already sorted by the aggregation key can leverage the sortation.
We need to enforce the maintenance of the sorting order according to what we specify in (1).
While Rheem has a sorting operator, we might need to be more specific in what Rheem's sorting semantics should actually be and we might need to be more rigorous in enforcing them.
map
operation? Or even areduceByKey
call?