substrait-io / substrait

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
https://substrait.io
Apache License 2.0
1.1k stars 144 forks source link

Add a window join operation with as-of support #3

Open jacques-n opened 2 years ago

jacques-n commented 2 years ago

I was talking with @cpcloud about supporting asof. During that discussion it generally felt as if it shouldn't be implemented/conceived of as a traditional join operator because the condition cannot be expressed as a scalar condition. Instead, it requires an expression it really has a window function like partition of analysis that then leads to a return of a set of matching rows given an input row.

As we discussed this, Phillip pointed out that asof is an example of this but not the only one (and there are several variations of as-of). As such, we think creating a new "window join" operation with a new kind of "window join function" or similar probably makes the most sense (with as-of variations being the first window join function to define). For reference, here is the window join operation in KDB: https://code.kx.com/q/ref/wj/

wesm commented 2 years ago

To add extra quick context for readers who do not dig into the KDB documentation, a window-join is sort of a combination of a range-join and an aggregation:

Each row in the "t" table corresponds to a variable window of data (determined by an expression, e.g. t.timestamp >= q.timestamp - interval '5 minutes' in the "q" table), and an aggregation function is applied to the data in the window — e.g. one might compute the min, max, mean, or median value from the "q" table