ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.33k stars 599 forks source link

feat: Support for window functions in Polars backend #10513

Open edschofield opened 1 day ago

edschofield commented 1 day ago

Is your feature request related to a problem?

Consider this question: "Find the penguin with the longest bill per species".

The following code works with the duckdb and sqlite backends:

result = penguins.mutate(
    rank=_.bill_length_mm.rank().over(
        group_by='species', order_by=-1 * _.bill_length_mm
    )
).filter(
    _.rank == 0
).execute()

But with the polars backend, the above code fails with the following exception:

OperationNotDefinedError: No translation rule for <class 'ibis.expr.operations.window.WindowFunction'>

What is the motivation behind your request?

This argmax or topk pattern is a common need for data analysis and a common SQL idiom. Other ways of implementing this, like with a join, would be less efficient.

Describe the solution you'd like

Ideally the above code would be translated to something like this Polars code:

penguins.group_by(
    'species'
).agg(
    pl.all().top_k_by(by='bill_length_mm', k=1).first()  # Get the top 1 row by bill_length_mm within each group
)

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

Polars

Code of Conduct