Open NickCrews opened 1 month ago
Hey Nick!
This is a super interesting problem, but I think it's not feasible for us to correlate an ibis expression with a particular duration (or some proxy of that like instructions, or even database plan node) because the thing that the database executes often looks very different from the thing it was asked to execute.
It's one of the trade-offs associated with using SQL and/or a database: you generally get better performance at the cost of debugability.
That said, what do you think about a backend.explain(expr)
method? We had this a while ago but it was removed due to lack of implementation across our backends.
This would at least allow you to get some insight into what the database thinks is happening.
what do you think about a backend.explain(expr) method?
I think that sounds nice, even if it returns teh raw string. However, to avoid needing to do a private method on a non-explicit-backend expression
e = ibis.read_csv().<sdfgdsfg>
e._find_backend().explain(e)
I would love to either (1 is my favorite)
ibis.explain(expr)
in addition to backend.explain(expr)
expr.explain()
(though we don't want to fill up that namespace TOO much)expr.explain()
in addition to backend.explain(expr)
Possible things to support (duckdb for inspiration)
Is your feature request related to a problem?
This is a bit broad, IDK if this should be a discussion.
I want to be able to profile queries. Specifically, which parts of the query are slow. This is tricky, because in ibis computation happens in two stages:
table.distinct(on="my_id")
call is responsible for 90% of the query runtime. Using timeit or other python profiling is useless here.I can currently do A/B testing, by just timing the whole run, but this is annoying to have to think about what to change, and then writing the change.
What is the motivation behind your request?
I am motived both as an end user, but also as a library author.
Describe the solution you'd like
Ideally every Operation could get associated with a run time, so for instance if you plot an expression graph you could see the runtime next to each node.
That seems very difficult though. Still very usable would be if we just got back the raw, non-machine readable output from eg
EXPLAIN ANALYZE <QUERY>
. This solution I don't think would even need to be implemented in ibis, but it would be ergonomic if it was a simpleexpr.profile()
oribis.profile(expr)
.I'm not sure about backend support. I don't know, but I imagine most of the SQL backends support some sort of profiling. SQlite profiling looks very different from duckdb profiling.
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct