Open douglas-raillard-arm opened 3 months ago
I believe this is covered by the following docs on selector set-ops: https://docs.pola.rs/api/python/stable/reference/selectors.html#set-operations
However, it might not be a bad idea simply to raise an error here; allowing only selectors to interop with other selectors via operators would prevent any such ambiguity 🤔 @stinodego, how do you feel about making this a touch stricter?
@alexander-beedie maybe I'm blind, can you quote the specific part ? This doc explains what happens with cs.by_name(), but the main issue here is the different behavior of pl.col() and cs.by_name() wrt to | and what happens when mixed together.
I can understand how they are both the correct behavior in their category how both behaviors are desirable, but the end result feel like an unfortunate API conflict:
Taking inspiration elsewhere, if it was in Haskell they would simply be 2 different operators, since they have 2 different meaning fundamentally and there would be no problem. Cases where multiple implementations are reasonable based on use case are dealt with by not implementing it on the base types and coming up with zero-cost wrappers that "decide" which way to go (e.g. Sum and Product wrappers for Monoid). That's not really possible here since it would make Expr cumbersome to use.
Alternatively, not allowing selectors and expr to mix would fix that (and still allow some explicit mixing with selectors .to_expr()). Then everything is commutative again with no surprise, everything is still possible, and forbidden combos just raise rather than do something unexpected.
From a doc point of view, it might be interesting to stress the fact that operator overload is different than for expressions and show an example to point it out.
This was already discussed in https://github.com/pola-rs/polars/issues/13757
This is a core issue with how selectors are currently set up as an Expr
subclass.
We have to revisit this, but doing so would be a breaking change. I have to admit that I am a bit exhausted with API design now after the release of 1.0, so I'll come back to this one a bit later.
Description
Not sure how this classifies but since it's likely to have been discussed elsewhere, let's take it as a doc improvement.
pl.col()
can be mixed with thepolars.selectors
API and leads to unexpected-at-first results rather than e.g. a straight up exception.Note that the behavior is consistent with
pl.col('a') | pl.col('b')
.When discovering the selectors API, I initially tried to combine
pl.col()
along with other selectors since I was used to usingdf.select(pl.col('foobar'))
to select a column. This can lead to surprising behaviors when combining with selectors.Link
No response