Open NickCrews opened 11 months ago
+1 to consolidating cases
and substitute
.
removing ibis.case() and Value.case()
I don't think we should remove these. They are extremely familiar for users coming from SQL. We can likely consolidate all the various APIs under the Case
ops though.
Also, a long-running grossness of .case() is that it returns a CaseBuilder object
The builder pattern is probably going to become more used in the codebase, especially where there's state that needs to be tracked. For example, I'm working on a way to better support chaining joins that uses this pattern. We're also thinking about how this might help clean up the GroupedTable
object.
Agree that we should document any of these that are user-facing.
I actually kinda like the idea of replacing the case builders with cases
operations instead.
Say we had a top-level ibis.cases
function to replace the ibis.case()
builder with the following implementation:
def cases(*branches: tuple[ir.BooleanValue, ir.Value], default=None):
cases, results = zip(*branches)
return ops.SearchedCase(cases, results, default).to_expr()
Compare the following examples:
t = ibis.table({"sym": "str", "left": "int", "right": "int"})
# Existing `ibis.case`
expr1 = t.mutate(
result=(
ibis.case()
.when(_.sym == "+", _.left + _.right)
.when(_.sym == "-", _.left - _.right)
.when(_.sym == "*", _.left * _.right)
.when(_.sym == "/", _.left / _.right)
.else_(0)
.end()
)
)
# New `ibis.cases`
expr2 = t.mutate(
result=ibis.cases(
(_.sym == "+", _.left + _.right),
(_.sym == "-", _.left - _.right),
(_.sym == "*", _.left * _.right),
(_.sym == "/", _.left / _.right),
default=0,
)
)
I personally like the ibis.cases
example and implementation better:
.when
call. It's just a function that returns an op.to_expr()
, same as most of ibis..end()
method the user needs to remember to call. Currently if you forget to call .end()
you get a pretty confusing error message..cases()
example easier to read. There's less visual noise without all the .when(
and .else_(
calls going on.IMO I think we should deprecate the builders in favor of ibis.cases
and Value.cases
.
cases
is probably similar enough to case
that coming from SQL wouldn't be a huge lift +1
What do you both think about my proposal of supporting both boolean conditions and equality values?
Passing single values already works with Value.cases:
t.x.cases([
(t.y, "equal to y"),
(t.z, "equal to z"),
])
vs the more precise/verbose
t.x.cases([
(_ == t.y, "equal to y"),
(_ == t.z, "equal to z"),
])
Should this feature get propagated, or should we deprecate and drop it?
Similarly, what about passing in lambdas for the predicates?
Similarly, what about supporting Mapping
in addition to the exisiting Iterable[tuple]
?
Should this feature get propagated, or should we deprecate and drop it?
I think that makes sense for Value.cases
(where there's a concrete thing to compare against) but not for ibis.cases
which is intended to be used inside a larger expression like Table.mutate
. I definitely don't think any api should special case on boolean expressions, where if it isn't a bool already it's compared against a thing - that sounds like madness.
Similarly, what about passing in lambdas for the predicates?
Seems fine to me, although given how good deferred expressions have gotten I'm tempted to avoid introducing lambda support into new apis. :shrug: could go either way here.
Similarly, what about supporting Mapping in addition to the exisiting Iterable[tuple]?
Also seems fine. Only argument I'd have to avoid it is it's a bit weird to type/document/support variadic *branches
as well as mapping/iterable-of-tuples. I think if we supported a mapping we should maybe ditch the variadic case? Something like:
def cases(branches: Mapping[Any, Any] | Iterable[tuple[Any, Any]], *, default: Any = None):
...
Again, no strong thoughts :shrug:.
I think that makes sense for Value.cases (where there's a concrete thing to compare against) but not for ibis.cases
Ok thanks for pointing that out, I felt there was something assymmetrical there but I couldn't put my finger on it. Annoying can't-have-it-all here: I would really like both flavors to have the exact same API, but at the same time I also want it to be a drop-in replacement for .substitute()
, which implies that we need to support the non-boolean keys...Not sure how to resolve this.
that sounds like madness
I had this vague intuition there was some madness here, but I haven't been able to come up with the actual case. do you have an example that you want to avoid?
lambdas
I think this can get added as a followup without breaking anything, so maybe we leave it out for now.
I think if we supported a mapping we should maybe ditch the variadic case?
Oh I didn't think of a variadic *branches
API! I like that better. I would vote for just that, and not supporting mappings. I also just remembered this other issue I filed, where Mappings come with the problem that keys need to be hashable, so you can't use plenty of Ibis types such as Deferreds as keys.
will .case
continue to work for a while after .cases
is implemented? from user perspective I would like to have a bit of time to migrate the code to .cases
yes, any work done here will have a deprecation period.
Is your feature request related to a problem?
All 4 functions are basically doing the same thing. It feels to me like we should be able to merge them all into once function with the same API. It is confusing to users (including me!) which they should use, and is extra burden keeping the docstrings etc all correct.
Also, a long-running grossness of .case() is that it returns a CaseBuilder object, which is undocumented. I don't think we want to add this CaseBuilder object to the public API, I think it would be totally doable and much simpler if we could keep it to a single function call.
Describe the solution you'd like
I propose:
(not sure about
default
vselse_
, and ifmapping
is the best name)where the keys can be
"foo"
or5
,value.upper()
value
and evaluated eg_.upper() == _
value
and return one of the above eglambda val: val.upper() == "FOO"
orlambda val: val.upper()
The values can be
_.lower()
value
and return one of the above eglambda val: val.lower()
I'm trying to think if there is some ambiguity if we accept both Values and BooleanValues similar to the ambiguity with selection vs filtering in
t[_.x.isnull()]
, but I can't think of it.I want to support the usecase of chained comparing one column to another:
(t.x == t.y).ifelse("same", (t.x > t.y).ifelse("greater", "less"))
I think this would be done with
t.x.case({lambda x: x == t.y: "same", lambda x: x > t.y: "greater"}, default="less")
but this feels a little gross. Makes me think either a top level or a table base API would be better:
t.cases({_.x == _.y: "same", _.x > _.y: "greater"}, default="less")
ibis.cases(t, {_.x == _.y: "same", _.x > _.y: "greater"}, default="less")
I would want to make this
What version of ibis are you running?
main
What backend(s) are you using, if any?
all
Code of Conduct