[Enh]: cumulative features

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Cumulative features, together with forward fill and some other checks/hacks, would most likely be enough to enable the equivalent of pandas expanding operations. This is a requirement to complete https://github.com/plotly/plotly.py/issues/4834.

Please describe the purpose of the new feature or describe the problem to solve.

List of cumulative expressions supported by polars:

[x] cum_count
[x] cum_min
[x] cum_max
[x] cum_prod
[x] cum_sum

With these, we would enable the following additional univariate expanding operations: mean, var, std, skew, kurt.

What's left out is: median, quantile and rank - I don't think we would be able to implement those 🥲 (entire pandas expanding window function list).

Group by context

Edit: Additionally, we should support these expr in group by's context. This is partially possible:

for pandas, we can use native methods, only with reverse=False (default argument), for cum_<min|max|sum|prod> (need to check how DataFrameGroupBy.cumcount behaves with nulls.
for pyarrow, the only way we can enable these are via __iter__, which I cannot say how slower it is than native methods

For the moment I would keep these out of the PRs introducing the methods in the first place. Thanks @AlessandroMiola to point that out in #1384

I am closing this issue as completed for now although these expr won't be available in group_by context. I think for now it would be a bit too hard to support them, although it would definitly be a nice to have for the over use case.

Even for pandas, even though DataFrameGroupBy has cumsum and other cumulative operations, its behaviour seems a bit unexpected as the group keys are not maintained in the output. Example from the doc itself:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = pd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df
          a   b   c
fox       1   8   2
gorilla   1   2   5
lion      2   6   9

>>> df.groupby("a").cumsum()
          b   c
fox       8   2
gorilla  10   7
lion      6   9

As you can see, the output has no column "a", not even in the index

narwhals-dev / narwhals