Closed kszucs closed 1 month ago
There are some ambiguities around the first arguments which can be a list of expressions:
table.select(["a", ["b"]])
# constructs
r0 := UnboundTable: table
a int8
b int16
c int32
d int64
e float32
f float64
g string
h boolean
i timestamp
j date
k time
Project[r0]
a: r0.a
('b',): ['b']
a
is interepreted as a column
whereas ["b"]
interpreted as an array literal
.
Apparently we also have a backend test case using a mapping as a single positional argument.
Why can't we keep
t.select([t.foo], t.bar)
?
It doesn't seem meaningfully different from allowing
t.select([t.foo], bar=t.bar)
If we allow:
t.select([t.foo], t.bar)
then should we allow
t.select([t.foo], [t.bar])
as well?
I think so. It's really things like t.select([[...]], [[...]])
that should be disallowed.
My preference would be to either provide args as variadics or as a single list-like argument. Even then we have confusion when to treat list-like inputs as literals vs value list.
We talked today during triage and decided that at minimum we want to remove the support for arbitrarily nested expressions (as Phillip noted above).
I actually agree with the initial proposal at the top (only accept lists of column refs if select is a 1-arg function, otherwise treat it as variadic). Coincidentally this matches how polars select
works, so there's some ecosystem-consistency there. The trick with doing this as a breaking change though is that previously working code may instead silently be treated as an array literal.
With this PR, what happens here? Does this error, or does it silently coerce these to array literals?
t.select(["a", "b"], ["c", "d"]) # is this two array literals or an error?
t.select("a", "b", ["c", "d"]) # is ["c", "d"] an array literal, or does this error?
Either way, I think we definitely want to forbid the case that phillip brought up before 9.0. I'm not opposed to the full changes here either, I think they make things more consistent, but we can always do that later IMO in 10.0.
t.select(["a", "b"], ["c", "d"]) # is this two array literals or an error?
yes, two array literals, not erroring
t.select("a", "b", ["c", "d"]) # is ["c", "d"] an array literal, or does this error?
yes, an array literal, not erroring
I would prefer deciding what should be the ideal long term behaviour then extend that to be backward compatible and either raise a deprecation warning or provide a config option to keep the previous behaviour.
Let's keep the full changeset here, and make sure to have both a BREAKING CHANGE
bit as well as documentation on the behavior in the relevant APIs (select
and mutate
).
Another thing I'm not sure about is whether we should do the dereferencing here or not? Like should t.bind(totally_different_table.col)
raise?
Seems like it might keep bind
more flexible and easier to maintain if we dereference elsewhere.
Actually it could help code consolidation. I can try it after the dereferencer improvement gets merged:
We should probably kill destructure()
now that proper alternatives (lift
) are available.
I'm going to rip out destructure
in a separate PR.
I'm going to rip out destructure in a separate PR.
Updated the test case to expect an input error rather than an integrity error. We should update unwrap_aliases()
to raise ibis input error instead of integrity error too.
One confusing case:
In [4]: table = ibis.table({"a": "int", "b": "string"})
In [5]: table.select(["a", ["b"]])
Out[5]:
r0 := UnboundTable: unbound_table_1
a int64
b string
Project[r0]
a: r0.a
('b',): ['b']
@NickCrews Can you review this again? This is a remaining blocker for the 9.0 release which we'd like to get out early next week.
I just took a long look.
Most things look great, a few nits, but the one substantial thing is the t.bind([1,"a"])
behavior. I have an inline comment above. Can someone fill me in on the context there a little more? I don't think this is a blocker for me.
@NickCrews Thanks for doing a deep dive on tests. There were indeed a few cases that weren't tested and also the lambda _: 2
case wasn't yet implemented. Appreciate the feedback!
Cool this looks great then!
@NickCrews Can you approve the PR?
Sorry I was on the mobile app where the approval status isn't even shown so i didn't know there was anything to do
Previously we allowed arbitrary nesting of input expressions for various API methods like
table.join
,table.group_by
, etc. While this was backward compatible with8.0.0
it also exposes additional ambiguity which can be confusing for users and difficult to reason about.This change restricts the nesting of input expressions to a "single level" of nesting with the exception of the first positional argument which can be a list of expressions, but only if there are no more positional arguments. Examples:
Aims to resolve https://github.com/ibis-project/ibis/issues/7819 and https://github.com/ibis-project/ibis/issues/8304, depends on https://github.com/ibis-project/ibis/pull/8884