Open ianmcook opened 4 months ago
We did explore exposing a shorthand previously (see https://github.com/ibis-project/ibis/issues/8574), but decided to just document a workaround until there was a request from the community. The documented solution (see https://ibis-project.org/tutorials/ibis-for-sql-users.html#top-k-operations) is quite similar to what you shared on the StackOverflow link. Agree that it's much more verbose than pandas.
This looks like something we should be able to implement a convenience wrapper for, though.
I could work on this one, if we want to have it. @cpcloud @jcrist
@jitingxu1 Can you describe the approach you're thinking about a bit?
I just went through the history but haven't had the chance to fully think it through yet. I've got a lot on my plate this week, so I might need to step back from this for now. If it's still available later, I can take a look then. @cpcloud
Is your feature request related to a problem?
As described here, filtering a table to return the row(s) with largest value(s) in each group feels harder in Ibis than in pandas. I wonder if Ibis could add some syntactic sugar to make this easier.
Describe the solution you'd like
dplyr has a function
top_n()
that makes this simpler syntactically:I wonder if we could add something like that in Ibis? Ibis already has a
topk
function, but it's a vector function, not a table function. Maybe Ibis could add atopk
table function that translates into an operation like this?What version of ibis are you running?
9.1.0
What backend(s) are you using, if any?
DuckDB
Code of Conduct