Open jcrist opened 1 week ago
I think dropping those other operations makes it harder to understand sequences of operations conceptually, so I'd prefer not to drop them.
That's how Ibis used to represent all selections and it was very difficult to understand the process by which expressions moved into SQL (or whatever else).
Splitting things up into separate operations way isn't without trade-offs, but it helps a lot with isolating the various parts of the compilation pipeline and reasoning about what is and isn't allowed (structurally, typing-wise, and optimization-wise).
+1 to using Select
in the polars backend if that leads to some simplification.
Splitting things up into separate operations way isn't without trade-offs, but it helps a lot with isolating the various parts of the compilation pipeline and reasoning about what is and isn't allowed (structurally, typing-wise, and optimization-wise).
Doesn't the rewrite system help a bit with alleviating this? Besides the fusion code (project_to_select
, merge_select_select
, ...) I didn't really see much analysis/rewrite code that dispatched on Project
/Filter
/Sort
/Distinct
. If we're always fusing for SQL generation, using Select
immediately would avoid a set of rewrites later. If you have a reference to some code that uses the split types for analysis that would be a helpful reference.
I believe all the DerefMap code depends on the various operations being split up.
Ah, it does. Missed that file, thanks.
Our other SQL backends convert
Project
/Filter
/Sort
/Distinct
into a singleSelect
operation. This fusion both results in simpler SQL, and results in these operations being (with some exceptions) commutative. In #9923 I added a test for this commutativity which is currently failing for thepolars
backend since we don't rewrite these queries toSelect
nodes.I think the easiest fix would be to use the same rewrites as the SQL backend to generate
SQL
nodes. With the deprecation (and future removal) of thedask
/pandas
backends, another option would be to dropProject
/Filter
/Sort
/Distinct
entirely internally and only make use of the more generalSelect
op.