pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.25k stars 1.95k forks source link

Some functions do not accept arbitrary expressions. Should it be part of an API? #18695

Open AlexeyDmitriev opened 1 month ago

AlexeyDmitriev commented 1 month ago

Description

So, polars notably allows more types of expressions in select that is available in SQL. (example in tutorial)

However, there are a lot of places (for example with_column, join, filter, group_by) that expect expressions that give exactly 1 value for each row of some df (otherwise operation doesn't make much sense). https://github.com/pola-rs/polars/issues/9603 is an example where wrong type of expression is used causes unexpected behaviour

Should this special kind of expression be a separate type in APIs? (Where every OneToOneExpression (better name needed) is an Expr but not vice versa)

ritchie46 commented 1 month ago

We are in the process catching those bugs earlier and making our IR more strict and correct.

The context examples you gave should check that the eventual expressions are elementwise. We will raise on those invalid expressions. Though I don't think we can handle it in the static type system (or want to).

AlexeyDmitriev commented 1 month ago

Thanks for your feedback

My observation was that this is a common abstraction i.e. can be used in several places (also in users code as well) I'd say I'd prefer static type solution if it was feasible

You are indeed more in position to say whether it's easier to check that in runtime or type-checking time. Feel free to close the issue if the final decision is to leave things as is