ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.19k stars 590 forks source link

feat: expose `Table._ensure_expr()` #8304

Closed NickCrews closed 3 months ago

NickCrews commented 8 months ago

Is your feature request related to a problem?

I am writing a framework for record linkage on top of Ibis. As part of that, I have an API that takes as arguments

  1. A Table
  2. something that references a column within that table

So the basic examples for point 2 are

  1. a string eg "my_col"
  2. a deferred eg _.my_col.upper()[:3]
  3. a lambda eg lambda table: table.my_col.upper().cast(int)

It would be nice if there was a universal API on Tables that allowed me convert all of these to a Column. I can't use __getitem__, because that can return a Table:

import ibis

t = ibis.memtable({"island": [1, 2, 3, 4, 5]})

print(type(t["island"]))  # Column
print(type(t[_.island]))  # Table
print(type(t[lambda t: t.island]))  # Table

Currently I am using Table._ensure_expr, but that feels icky since it is private.

Describe the solution you'd like

maybe Table.column(Any) -> Column?

We should think about if the new method would shadow the name of a column in the Table, but I hope that people aren't nameing their columns "column"...

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

cpcloud commented 8 months ago

Thanks for the feature request!

Like many other things, the implementation of the binding process is changing quite a bit in the-epic-split.

Table._ensure_expr is removed in that branch, and replaced with a function called bind that universally handles inputs, and converts them into an iterable of expressions.

bind is more generic than handling just a single column though. It not only handles strings, deferreds and lambdas, but also selectors, mappings and iterables of all those things.

That bind API looks like this:

exprs = bind(table, "island")
exprs = bind(table, _.island)
exprs = bind(table, lambda t: t.island)
exprs = bind(table, s.matches("island"))
exprs = bind(table, table.island)
exprs = bind(table, ["island"])
exprs = bind(table, {"eye-land": "island"})

@kszucs Thoughts on making this API public after we merge the-epic-split?

kszucs commented 8 months ago

I think we can expose bind, though not sure whether this should be exposed as a function or a method. Preference?

NickCrews commented 8 months ago

That looks perfect. Can we make it so it is a list of Columns, not a mere Iterable of Columns? that will be more usable for people, and there shouldn't be a performance downside.

I vote method, so there is symmetry with

cpcloud commented 3 months ago

We've got bind for this now, since 9.0.0.