Open lautarortega opened 1 year ago
+1, great idea.
I guess we can rewrite this issue request as a "sliding window" clause.
Yes, I was looking for something like this PySpark function.
If you want to address the current row and only the first/last row you can
use .first().over()
/ .last().over()
in an expression:
df = pl.DataFrame({
"id": [1, 2, 3, 1, 2, 3, 1, 2, 3],
"value": ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
})
df.with_columns(all = pl.col("value") + pl.first("value").over("id") + pl.last("value").over("id"))
# shape: (9, 3)
# ┌─────┬───────┬─────┐
# │ id ┆ value ┆ all │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ str ┆ str │
# ╞═════╪═══════╪═════╡
# │ 1 ┆ a ┆ aag │
# │ 2 ┆ b ┆ bbh │
# │ 3 ┆ c ┆ cci │
# │ 1 ┆ d ┆ dag │
# │ 2 ┆ e ┆ ebh │
# │ 3 ┆ f ┆ fci │
# │ 1 ┆ g ┆ gag │
# │ 2 ┆ h ┆ hbh │
# │ 3 ┆ i ┆ ici │
# └─────┴───────┴─────┘
If you have a specific example, perhaps a better suggestion could be made.
For now, I was thinking of doing this instead
df = df.with_columns((pl.col(self.target_col).over(index_plus_order).sort_by(self.order_col).median()).alias(self.output_col_median))
sql = pl.SQLContext()
sql.register('df', df)
df = sql.execute(f"""
SELECT *,
LAST({self.output_col_median}) OVER (PARTITION BY {index} ORDER BY {self.order_col} ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Test
FROM df""")
If you want to address the current row and only the first/last row you can use
.first().over()
/.last().over()
in an expression:df = pl.DataFrame({ "id": [1, 2, 3, 1, 2, 3, 1, 2, 3], "value": ["a", "b", "c", "d", "e", "f", "g", "h", "i"] }) df.with_columns(all = pl.col("value") + pl.first("value").over("id") + pl.last("value").over("id")) # shape: (9, 3) # ┌─────┬───────┬─────┐ # │ id ┆ value ┆ all │ # │ --- ┆ --- ┆ --- │ # │ i64 ┆ str ┆ str │ # ╞═════╪═══════╪═════╡ # │ 1 ┆ a ┆ aag │ # │ 2 ┆ b ┆ bbh │ # │ 3 ┆ c ┆ cci │ # │ 1 ┆ d ┆ dag │ # │ 2 ┆ e ┆ ebh │ # │ 3 ┆ f ┆ fci │ # │ 1 ┆ g ┆ gag │ # │ 2 ┆ h ┆ hbh │ # │ 3 ┆ i ┆ ici │ # └─────┴───────┴─────┘
If you have a specific example, perhaps a better suggestion could be made.
I dont want to compare it only with the first and current but rather everything in between.
Basically, I want to rewrite this pyspark function to polars
# take median of category + timestamp
w = Window.partitionBy(F.col(index_plus_order)).orderBy(F.col(self.order_col))
treated_df = df.withColumn(self.output_col_median,
F.expr(f"PERCENTILE({self.target_col}, 0.5)").over(w))
# forward fill target col
w2 = Window.partitionBy(F.col(index)).orderBy(F.col(self.order_col)) \
.rowsBetween(Window.unboundedPreceding, Window.currentRow)
filled_column = F.last(F.col(self.output_col_median), ignorenulls=True).over(w2)
treated_df = treated_df.withColumn(self.output_col_median, filled_column)```
Are you asking to apply functions to an expanding window in addition to the rolling window operations in Polars?
To clarify:
Rolling: f(x[0..t])
for all t
Expanding: f(x[(t-w)..t))
for all t
and fixed w
And you're asking this to be done per group as well as over a whole frame/series?
Yes, I was looking for something like this PySpark function.
In short, I want this functionality, of rowsbetween.
Basically, using pl.Expr.over I want to be able to get a different result for each element of the group.
For example getting the accumulated sales for each product.
I want to group by product, have each slice sorted by month, and apply a sum operation for the current row and all the preceding ones.
So jan will have the sales for jan Feb will have jan + feb Mar will have jan + feb + mar
It is a function that SQL has, an PySpark as well. I think it could be a useful enhancement.
For example getting the accumulated sales for each product.
That sounds like a .cumsum()
:
df = pl.DataFrame(dict(
product = ["A", "B", "A", "C", "C", "B", "A"],
date = ["2022/12/01", "2022/10/02", "2022/01/11", "2022/02/01", "2022/03/02", "2022/04/22", "2022/06/12"],
sales = [12, 11, 13, 14, 20, 11, 7]
))
(df.with_row_count()
.sort(pl.col("date").str.to_datetime().dt.month())
.with_columns(
cumsum = pl.col("sales").cumsum().over("product")
)
)
# shape: (7, 5)
# ┌────────┬─────────┬────────────┬───────┬────────┐
# │ row_nr ┆ product ┆ date ┆ sales ┆ cumsum │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ u32 ┆ str ┆ str ┆ i64 ┆ i64 │
# ╞════════╪═════════╪════════════╪═══════╪════════╡
# │ 2 ┆ A ┆ 2022/01/11 ┆ 13 ┆ 13 │
# │ 3 ┆ C ┆ 2022/02/01 ┆ 14 ┆ 14 │
# │ 4 ┆ C ┆ 2022/03/02 ┆ 20 ┆ 34 │ # 14 + 20 (row_nr: 3, 4)
# │ 5 ┆ B ┆ 2022/04/22 ┆ 11 ┆ 11 │
# │ 6 ┆ A ┆ 2022/06/12 ┆ 7 ┆ 20 │ # 13 + 7 (row_nr: 2, 6)
# │ 1 ┆ B ┆ 2022/10/02 ┆ 11 ┆ 22 │ # 11 + 11 (row_nr: 5, 1)
# │ 0 ┆ A ┆ 2022/12/01 ┆ 12 ┆ 32 │ # 13 + 7 + 12 (row_nr: 2, 6, 0)
# └────────┴─────────┴────────────┴───────┴────────┘
Maybe not the best example then. However, this is not specific to .cumsum().
I also think this would be a useful addition to Polars
the SQL and Spark syntax ‘preceding rows’ and ‘following rows’ is very flexible and allows expressing complex calculations in an easy way
It’s probably the only thing I miss from PySpark .
How does this relate to groupby_rolling
?
hi @ritchie46 ,
I believe the groupby_rolling does not allow to 'look ahead', like the 'preceding / following' syntax does.
For example, here is a Stack Overflow question for reproducing the 'preceding / following' syntax in Polars: https://stackoverflow.com/questions/74513494/polars-take-a-window-of-n-rows-surrounding-a-row-fulfilling-a-condition
So this operation is possible in Polars using the solutions from SO but it's more verbose and potentially less clear than a 'preceding / following' syntax. This syntax could also be seen as a generalization of cumulative functions (in addition to a generalization of groupby_rolling). The cumulative functions could be expressed as 'preceding = unbounded'.
So overall adding it could reduce the number of Polars functions to learn when first starting with Polars.
A Polars syntax could be something like over('group', preceding = '2i', following='unbounded')
The possible inputs for preceding and following can be the same as groupby_rolling with one addition for unbounded.
What do you think?
There seem to be two questions going on here. With the first that asks for an expanding/cumulative window, there's a min_periods argument you can set to 1, but then you also need to know the number of rows in the data/per group, which I guess is an extra step. For the other question, there's the center option if you want a symmetric window. Otherwise it's bookkeeping to align labels and results. Out of curiosity, what's the use case for a fixed number of preceding observations and unlimited following?
hi @magarick ,
I agree that it's possible to reproduce most transformations in Polars today (using expressions as building blocks is indeed quite powerful). Even the 'look ahead' transformation is possible (see Stack Overflow answer above), even though it might be complicated to do if different groups have different lengths (for example: when 'following unbounded' represents 3 rows for one group and 10 for another).
I think the opportunity is for an easy way to express those transformations. With the added bonus that people with a SQL and/or a Spark background are used to this type of syntax, so this makes it easier to switch to Polars.
I teach Polars to mid-sized and big organizations, and how to do 'preceding / following' is a question that often comes up. There is oftentimes a way to reproduce calculations with Polars syntax, however this would simplify things (in my opinion).
To wrap it up: I am not arguing that Polars should do something because other tools do it. Polars has the advantage of not taking all syntax from other tools and keeping a clean API. However, in this case, it might be worth in my opinion.
for the example of preceding = 2i and following = unbounded, I did not take it from a specific use case
curious to hear @ritchie46 's thoughts
I believe the groupby_rolling does not allow to 'look ahead'
There is some support with offset=
@lucazanna
df = pl.DataFrame(dict(
id = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
idx = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
val = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"]
))
(df.groupby_rolling(index_column="idx", by="id", period="5i", offset="-3i")
.agg(window = pl.col("val"))
.with_columns( # just to display it
pl.col("window").list.to_struct(n_field_strategy="max_width")
)
)
# shape: (11, 3)
# ┌─────┬─────┬─────────────────────────┐
# │ id ┆ idx ┆ window │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ struct[5] │
# ╞═════╪═════╪═════════════════════════╡
# │ 1 ┆ 1 ┆ {"a","b","c",null,null} │
# │ 1 ┆ 2 ┆ {"a","b","c","d",null} │
# │ 1 ┆ 3 ┆ {"a","b","c","d","e"} │
# │ 1 ┆ 4 ┆ {"b","c","d","e","f"} │
# │ 1 ┆ 5 ┆ {"c","d","e","f",null} │
# │ 1 ┆ 6 ┆ {"d","e","f",null,null} │
# │ 2 ┆ 7 ┆ {"g","h","i",null,null} │
# │ 2 ┆ 8 ┆ {"g","h","i","j",null} │
# │ 2 ┆ 9 ┆ {"g","h","i","j","k"} │
# │ 2 ┆ 10 ┆ {"h","i","j","k",null} │
# │ 2 ┆ 11 ┆ {"i","j","k",null,null} │
# └─────┴─────┴─────────────────────────┘
You don't get leading nulls though, so the "positions" are not "correct", and I'm not sure if there is a proper way to do unbounded.
You can use period=f"{df.height}i"
for following = unbounded on a non-lazy dataframe.
I imagine polars-sql will eventually have to support this RANGE BETWEEN FOLLOWING/PRECEDING UNBOUNDED syntax?
So it seems like a good idea if a polars idiom existed for it.
I ran into this problem while trying to get the count of records that ocurred before the current record. It is trivial to get this result using PySpark but I am having a hard time to get it using Polars. I have created this StackOverflow question but I am also copying the contents of it here.
I tried using groupby_rolling
, correct me if I am wrong, but it seems to not support aggregates over a feature. Also it requires a lot os steps to first create an index column with integer value to later groupby "i" period. I think the PySpark API is a great reference as to how it could work. Also after executing the groupby I would need to join the data back to the original dataframe in the context of feature engineering and if there are multiple index columns or if the index column is a string
I can't groupby with groupby_rolling
.
EDIT: There is an answer in StackOverflow that works for my case, maybe it solves some other people's problems here too. I still believe the solution isn't as intuitive as PySpark Window but it works for now.
Honestly this is the biggest drawback holding me from switching many of my current code from pyspark to Polars. Correct me if I'm wrong, but right now there's no support for specifying window size in window functions, e.g. something like rows between unbounded preceding 1 preceding. The Pyspark syntax for this is pretty neat (also the possibility of using a Window object to pass to over()). Could this be a beneficial feature for Polars?
@montanarograziano
The particular example you've given is sort of possible - I believe it is what Pandas calls an "expanding window".
Since these calculations are a special case of rolling statistics
So it can be expressed in terms of .rolling()
but it's a bit awkward.
.height
of the frameoffset=
(I'm also not sure of how "efficient" this ends up being)
pl.Config(fmt_table_cell_list_len=-1)
df = pl.DataFrame(dict(letter=list("abcdefgh"))).with_row_index()
# period = df.height
# offset = -(df.height + 1)
(df.rolling(index_column=pl.col("index").cast(int), period="8i", offset="-9i")
.agg(
window = pl.col("letter")
)
)
# shape: (8, 2)
# ┌───────┬─────────────────────────────────────┐
# │ index ┆ window │
# │ --- ┆ --- │
# │ i64 ┆ list[str] │
# ╞═══════╪═════════════════════════════════════╡
# │ 0 ┆ [] │
# │ 1 ┆ ["a"] │
# │ 2 ┆ ["a", "b"] │
# │ 3 ┆ ["a", "b", "c"] │
# │ 4 ┆ ["a", "b", "c", "d"] │
# │ 5 ┆ ["a", "b", "c", "d", "e"] │
# │ 6 ┆ ["a", "b", "c", "d", "e", "f"] │
# │ 7 ┆ ["a", "b", "c", "d", "e", "f", "g"] │
# └───────┴─────────────────────────────────────┘
(There was an issue for adding dedicated "expanding" functions but it has gone stale: #4799)
@cmdlineluser I've tried experimenting with offset and period and yeah, for some small windows sizes it seems to work. However, trying to reproduce an "unbounded preceding" window causes my machine to crash for OOM, even on a big cluster. I believe it's just not optimized for it as it seems is not able to compute it in lazy mode (that's just one hypothesis).
Any update on this?
I've been using Polars heavily for the past year and it has become my preferred tool of choice. However, the lack of a Polars equivalent of Spark's rowsBetween
and rangeBetween
really start to prevent me from fully transitioning. I want to avoid falling back to duckdb for only this functionality. So my question is: given the current API. How would you translate the following syntax to Polars?
from datetime import date
import duckdb
import polars as pl
input_df = pl.DataFrame(
{
"element_id": [1, 1, 1, 2, 2, 2, 2],
"observation_dt": [
date(2024, 5, 25),
date(2024, 6, 1),
date(2024, 6, 7),
date(2024, 5, 7),
date(2024, 5, 12),
date(2024, 5, 14),
date(2024, 5, 20),
],
"feature": [0, 2, 1, 3, 6, 1, 0],
}
)
query = """
SELECT element_id,
observation_dt,
sum(feature) OVER (PARTITION BY element_id ORDER BY observation_dt RANGE BETWEEN INTERVAL 14 DAYS PRECEDING AND INTERVAL 1 DAYS PRECEDING) as transformed_feature
FROM input_df
"""
output_df = duckdb.sql(query).pl()
Where input_df
:
shape: (7, 3)
┌────────────┬────────────────┬─────────┐
│ element_id ┆ observation_dt ┆ feature │
│ --- ┆ --- ┆ --- │
│ i64 ┆ date ┆ i64 │
╞════════════╪════════════════╪═════════╡
│ 1 ┆ 2024-05-25 ┆ 0 │
│ 1 ┆ 2024-06-01 ┆ 2 │
│ 1 ┆ 2024-06-07 ┆ 1 │
│ 2 ┆ 2024-05-07 ┆ 3 │
│ 2 ┆ 2024-05-12 ┆ 6 │
│ 2 ┆ 2024-05-14 ┆ 1 │
│ 2 ┆ 2024-05-20 ┆ 0 │
└────────────┴────────────────┴─────────┘
and output_df
:
shape: (7, 3)
┌────────────┬────────────────┬─────────────────────┐
│ element_id ┆ observation_dt ┆ transformed_feature │
│ --- ┆ --- ┆ --- │
│ i64 ┆ date ┆ f64 │
╞════════════╪════════════════╪═════════════════════╡
│ 2 ┆ 2024-05-07 ┆ null │
│ 2 ┆ 2024-05-12 ┆ 3.0 │
│ 2 ┆ 2024-05-14 ┆ 9.0 │
│ 2 ┆ 2024-05-20 ┆ 10.0 │
│ 1 ┆ 2024-05-25 ┆ null │
│ 1 ┆ 2024-06-01 ┆ 0.0 │
│ 1 ┆ 2024-06-07 ┆ 2.0 │
└────────────┴────────────────┴─────────────────────┘
I know that one workaround it to use rolling
in a smart way and join the output back to the original dataframe. But, for some reason it goes OOM for my dataset. So I was hoping there is some smart alternative available...
@cmdlineluser I've tried experimenting with offset and period and yeah, for some small windows sizes it seems to work. However, trying to reproduce an "unbounded preceding" window causes my machine to crash for OOM, even on a big cluster. I believe it's just not optimized for it as it seems is not able to compute it in lazy mode (that's just one hypothesis).
This problem still exists in polars 1.0.0.
I believe that the PySpark solution using windows and unbounded preceding is the way to go for the Polars API.
I've been using Polars heavily for the past year and it has become my preferred tool of choice. However, the lack of a Polars equivalent of Spark's
rowsBetween
andrangeBetween
really start to prevent me from fully transitioning. I want to avoid falling back to duckdb for only this functionality. So my question is: given the current API. How would you translate the following syntax to Polars?from datetime import date import duckdb import polars as pl input_df = pl.DataFrame( { "element_id": [1, 1, 1, 2, 2, 2, 2], "observation_dt": [ date(2024, 5, 25), date(2024, 6, 1), date(2024, 6, 7), date(2024, 5, 7), date(2024, 5, 12), date(2024, 5, 14), date(2024, 5, 20), ], "feature": [0, 2, 1, 3, 6, 1, 0], } ) query = """ SELECT element_id, observation_dt, sum(feature) OVER (PARTITION BY element_id ORDER BY observation_dt RANGE BETWEEN INTERVAL 14 DAYS PRECEDING AND INTERVAL 1 DAYS PRECEDING) as transformed_feature FROM input_df """ output_df = duckdb.sql(query).pl()
Where
input_df
:shape: (7, 3) ┌────────────┬────────────────┬─────────┐ │ element_id ┆ observation_dt ┆ feature │ │ --- ┆ --- ┆ --- │ │ i64 ┆ date ┆ i64 │ ╞════════════╪════════════════╪═════════╡ │ 1 ┆ 2024-05-25 ┆ 0 │ │ 1 ┆ 2024-06-01 ┆ 2 │ │ 1 ┆ 2024-06-07 ┆ 1 │ │ 2 ┆ 2024-05-07 ┆ 3 │ │ 2 ┆ 2024-05-12 ┆ 6 │ │ 2 ┆ 2024-05-14 ┆ 1 │ │ 2 ┆ 2024-05-20 ┆ 0 │ └────────────┴────────────────┴─────────┘
and
output_df
:shape: (7, 3) ┌────────────┬────────────────┬─────────────────────┐ │ element_id ┆ observation_dt ┆ transformed_feature │ │ --- ┆ --- ┆ --- │ │ i64 ┆ date ┆ f64 │ ╞════════════╪════════════════╪═════════════════════╡ │ 2 ┆ 2024-05-07 ┆ null │ │ 2 ┆ 2024-05-12 ┆ 3.0 │ │ 2 ┆ 2024-05-14 ┆ 9.0 │ │ 2 ┆ 2024-05-20 ┆ 10.0 │ │ 1 ┆ 2024-05-25 ┆ null │ │ 1 ┆ 2024-06-01 ┆ 0.0 │ │ 1 ┆ 2024-06-07 ┆ 2.0 │ └────────────┴────────────────┴─────────────────────┘
I know that one workaround it to use
rolling
in a smart way and join the output back to the original dataframe. But, for some reason it goes OOM for my dataset. So I was hoping there is some smart alternative available...
I hope a PySpark-like solution using rowsBetween will be available soon. I am a bit stuck on this simple problem.
@montanarograziano
The particular example you've given is sort of possible - I believe it is what Pandas calls an "expanding window".
Since these calculations are a special case of rolling statistics
So it can be expressed in terms of
.rolling()
but it's a bit awkward.
- requires hardcoding the period size to the
.height
of the frame- fiddling with
offset=
(I'm also not sure of how "efficient" this ends up being)
pl.Config(fmt_table_cell_list_len=-1) df = pl.DataFrame(dict(letter=list("abcdefgh"))).with_row_index() # period = df.height # offset = -(df.height + 1) (df.rolling(index_column=pl.col("index").cast(int), period="8i", offset="-9i") .agg( window = pl.col("letter") ) ) # shape: (8, 2) # ┌───────┬─────────────────────────────────────┐ # │ index ┆ window │ # │ --- ┆ --- │ # │ i64 ┆ list[str] │ # ╞═══════╪═════════════════════════════════════╡ # │ 0 ┆ [] │ # │ 1 ┆ ["a"] │ # │ 2 ┆ ["a", "b"] │ # │ 3 ┆ ["a", "b", "c"] │ # │ 4 ┆ ["a", "b", "c", "d"] │ # │ 5 ┆ ["a", "b", "c", "d", "e"] │ # │ 6 ┆ ["a", "b", "c", "d", "e", "f"] │ # │ 7 ┆ ["a", "b", "c", "d", "e", "f", "g"] │ # └───────┴─────────────────────────────────────┘
(There was an issue for adding dedicated "expanding" functions but it has gone stale: #4799)
This cannot work when each "group" has a different number of elements.
A possible solution comes from using the SQL Context from Polars 1.0. However I believe also that this result is really strange:
df = pl.DataFrame({
'NAME': ['a', 'a', 'a', 'b', 'b'],
'INDEX': [0, 1, 2, 0, 2],
'VAL': [1,1,1,1,1]
})
sql = pl.SQLContext()
sql.register('df', df)
sql.execute(
f"""
SELECT
NAME,
INDEX,
SUM(VAL) OVER (PARTITION BY NAME ORDER BY INDEX ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMSUM
FROM
df
"""
).collect()
I get the following
┌──────┬───────┬────────┐
│ NAME ┆ INDEX ┆ CUMSUM │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════╪═══════╪════════╡
│ a ┆ 0 ┆ 3 │
│ a ┆ 1 ┆ 3 │
│ a ┆ 2 ┆ 3 │
│ b ┆ 0 ┆ 2 │
│ b ┆ 2 ┆ 2 │
└──────┴───────┴────────┘
instead of the cumulative sum over the VALUE
.
I'd expect this result, indeed, given that the window function is performed with the rows between preceding and current row.
┌──────┬───────┬────────┐
│ NAME ┆ INDEX ┆ CUMSUM │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════╪═══════╪════════╡
│ a ┆ 0 ┆ 1 │
│ a ┆ 1 ┆ 2 │
│ a ┆ 2 ┆ 3 │
│ b ┆ 0 ┆ 1 │
│ b ┆ 2 ┆ 2 │
└──────┴───────┴────────┘
@CarloNicolini I can't find anything in polars-sql
to suggest ROWS BETWEEN
is supported yet.
It seems it's just silenty ignored at the moment.
A possible solution comes from using the SQL Context from Polars 1.0. However I believe also that this result is really strange:
df = pl.DataFrame({ 'NAME': ['a', 'a', 'a', 'b', 'b'], 'INDEX': [0, 1, 2, 0, 2], 'VAL': [1,1,1,1,1] }) sql = pl.SQLContext() sql.register('df', df) sql.execute( f""" SELECT NAME, INDEX, SUM(VAL) OVER (PARTITION BY NAME ORDER BY INDEX ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUMSUM FROM df """ ).collect()
I get the following
┌──────┬───────┬────────┐ │ NAME ┆ INDEX ┆ CUMSUM │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞══════╪═══════╪════════╡ │ a ┆ 0 ┆ 3 │ │ a ┆ 1 ┆ 3 │ │ a ┆ 2 ┆ 3 │ │ b ┆ 0 ┆ 2 │ │ b ┆ 2 ┆ 2 │ └──────┴───────┴────────┘
instead of the cumulative sum over the
VALUE
. I'd expect this result, indeed, given that the window function is performed with the rows between preceding and current row.┌──────┬───────┬────────┐ │ NAME ┆ INDEX ┆ CUMSUM │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞══════╪═══════╪════════╡ │ a ┆ 0 ┆ 1 │ │ a ┆ 1 ┆ 2 │ │ a ┆ 2 ┆ 3 │ │ b ┆ 0 ┆ 1 │ │ b ┆ 2 ┆ 2 │ └──────┴───────┴────────┘
Yeah I tried with polars.sql too and the results are wrong. Probabily this is intended as there is no specific polars API available yet for rowsBetween
Problem description
I would like to have a parameter for pl.Expr.over to determine the window edges. Like to take only the first and current row, or the current row and the last. So, not only the whole group, but also only bits of the group.
Maybe there is such feature, but I could not find it