substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.16k stars 150 forks source link

Question / request for pre-grouping keys in Aggregate operation #501

Open whutjs opened 1 year ago

whutjs commented 1 year ago

Hi, we are using substrait & velox to build an OLAP execution engine. Besides specify the grouping keys of aggregation, we can also specify the pre-grouping keys in velox. Pre-grouping keys are a subset of the grouping keys on which the input is clustered, i.e. identical sets of values for these keys always appear next to each other. We'd like to be able to specify the pre-grouping keys in substrait but we fail to find a way to do that. Does substrait support specify the pre-grouping keys of aggregation? If not, will it be supported?

westonpace commented 1 year ago

Does substrait support specify the pre-grouping keys of aggregation?

No.

If not, will it be supported?

Yes, this is something we want to be able to express.

Substrait occasionally talks of distribution (mostly here) which is related to the feature you are describing. Each operator has rules about how it handles distribution. For example, a project operation should not modify a distribution. However, this information is mostly useful for logical planners to decide if they can use a distribution-aware aggregation. So I don't think that feature, alone, gives you any answer.

We have a similar feature in Acero (called segment keys). There are a few minor corner case behaviors that it would be good to align on.

A. Should we worry about jittery boundaries?

Often these grouping keys are based on some kind of time series value (e.g. given data ordered on time, generate aggregates for each month of data). Those segment boundaries can be a bit noisy (e.g. right at the end of the month you might get [MARCH, MARCH, APRIL, MARCH, APRIL, APRIL, APRIL]) for various reasons (unsync'd clocks, non-atomic updates, etc.)

You could imagine solving this with some kind of threshold argument. However, I'd prefer that this is NOT something handled by this feature. If jitter needs to be smoothed out then it should be done in some other kind of operator.

B. How should imperfect data be handled?

The behavior of pregrouping keys is straightforward if the data is perfect (e.g. [A, A, A, B, B, B, C, C, C]). However, what should happen if the input data is not perfect (e.g. [A, A, A, B, B, B, A, A, A]). Should this be undefined behavior? Or should we declare that this is possible and the output should be allowed to have two rows with the same keys.

I'd prefer that this should be well defined behavior and multiple rows is the correct output. This is partly because it's how we handle it currently and also partly because I think there could potentially be use for such a feature someday.

However, I do not feel extremely strongly about this. If we define this as allowed input then this node becomes semantically different than the aggregate node (and is no longer just an optimization) so I think there is a good argument for not allowing this kind of imperfect input as well.

C. Should this be an extension for aggregate or a separate rel?

I don't have a strong opinion here. If this is purely an optimization then I think making it an extension for the existing aggregate rel would be useful. On the other hand, if we allow imperfect data then this isn't strictly a pure optimization.

Are there any other nuances you can think of?

whutjs commented 1 year ago

@westonpace Thanks for your reply! Actually, when I raise this issue, I didn't give it too much thought. Because velox will do some analyzing for the data of pre-grouping keys by itself and I just have to specify which keys are pre-grouped. So if I could choose, I will implement this feature just as an extension for aggregate. For example, just an extended properties of aggregate. But I agree that if we want to implement this feature in substrait, we have to consider as much scenario as possible.