Open noklam opened 1 week ago
Thank you for raising this issue!
But some reason the documentation to extend SQL with expression is quite hidden. I have browsed the documentation a couple of times before but didn't know this feature exist.
I do think it would be worthwhile for us to explain this behavior in greater detail.
I was able to repro this behavior you're seeing. It seems that, at least in the case of the DuckDB backend, the alias
method wraps the expression itself in a CTE.
In [1]: import pandas as pd
In [2]: import ibis
In [3]: con = ibis.duckdb.connect()
In [4]: con.create_table("raw_a", pd.DataFrame({"id": [1, 2, 3]}))
Out[4]:
DatabaseTable: memory.main.raw_a
id int64
In [5]: con.create_table("raw_b", pd.DataFrame({"id": [2, 3, 4]}))
Out[5]:
DatabaseTable: memory.main.raw_b
id int64
In [6]: a = con.sql("SELECT * FROM raw_a").alias("a")
In [7]: b = con.sql("SELECT * FROM raw_b").alias("b")
In [8]: a.sql("SELECT * FROM a").compile()
Out[8]: 'WITH "a" AS (SELECT * FROM raw_a) SELECT * FROM a'
In [9]: b.sql("SELECT * FROM b").compile()
Out[9]: 'WITH "b" AS (SELECT * FROM raw_b) SELECT * FROM b'
Extending the expression alias with .sql
would be possible to refer to persisted objects, but I don't think this is what you want to achieve, is it?
In [10]: b.sql("select a.*, b.* from raw_a as a, b where a.id=b.id")
Out[10]:
r0 := SQLQueryResult
query:
SELECT * FROM raw_b
schema:
id int64
r1 := View: b
id int64
SQLStringView[r1]
query:
select a.*, b.* from raw_a as a, b where a.id=b.id
schema:
id int64
@IndexSeek Thanks for the responss. b.sql("select a.*, b.* from raw_a as a, b where a.id=b.id")
, here you can only refer to b
directly but not a
, you kind of redo the alias here (raw_a as a), but imagine this is actually a full CTE, or multiple one. The it essentially become putting the entire queries in one and that's not what I want.
I actually expect it to be wrapped as a CTE, this kind of provide the flexibility to breakdown a complex query into small one. If this is done in SQL you have to keep one big query, but in ibis
things are lazy, so I can either run only a single CTE (for debugging), but also compose multiple CTE to do complex query. (Think dbt ref{{ table}}
, which is more like string templating)
You're welcome! I discovered that it's possible to chain multiple .sql
methods and continue to alias within the same expression, allowing you to reference those aliases as they are created as CTEs. However, I’m still not sure if this will provide what you’re looking for.
In [14]: con.sql("SELECT * FROM raw_a").alias("a").sql("SELECT * FROM raw_b").alias("b").sql(
...: "SELECT a.id, b.id AS id_2 FROM a, b WHERE a.id=b.id"
...: )
Out[14]:
┏━━━━━━━┳━━━━━━━┓
┃ id ┃ id_2 ┃
┡━━━━━━━╇━━━━━━━┩
│ int64 │ int64 │
├───────┼───────┤
│ 2 │ 2 │
│ 3 │ 3 │
└───────┴───────┘
In [15]: con.sql("SELECT * FROM raw_a").alias("a").sql("SELECT * FROM raw_b").alias("b").sql(
...: "SELECT a.id, b.id AS id_2 FROM a, b WHERE a.id=b.id"
...: ).compile()
Out[15]: 'WITH "a" AS (SELECT * FROM raw_a), "b" AS (SELECT * FROM raw_b) SELECT a.id, b.id AS id_2 FROM a, b WHERE a.id = b.id'
This might allow for running the individual expressions as part of .sql
independently for debugging and would allow for appending them if the results are as expected, but I'm hoping there might be another way to achieve this.
@IndexSeek This is getting closer and I can somewhat do what I want.
Though I think it's weird when I build table b
I need to use a.sql
where they are completely separate. This will become a problem when I have one CTE but want to shared across different complex queries.
I do not know ibis
internal enough, but it seems that the information is only stored in the ibis.expr.types.relations.Table
. If there is a way to merge two ibis.expr.types.relations.Table
into one then we may be able to get rid of this constraint.
Is your feature request related to a problem?
I am trying to use the SQL expression for multiple CTE but the
Table.sql
only allows for a single table.What is the motivation behind your request?
I found https://ibis-project.org/how-to/extending/sql very useful as an intermediate step to migrate existing SQL scripts to ibis. Instead of going from SQL -> ibis immediately, it is sometimes useful to breakdown a complex SQL into multiple ibis expressions (using the
.sql
). This requires minimal effort but already make the SQL scripts much more debuggable.Consider this example
To break it down with
ibis
, I start with copying the CTE as followAll follow option will fail as it doesn't understand the table alias
This make it possible to debug, or inspect data quickly inside a notebook. Then I want to join them together but I cannot find the correct API.
Backend.sql
only recognize tables that are exists already (raw_a, raw_b).while
a.sql()
will only recognize a but not b, andb.sql()
only recognize b but not a. Chaining ibis expression with SQL is powerful but this seems to limited to a single table, which make joins impossible.I know this is doable if I go for full ibis dataframe code, but this is not what I want here.
Describe the solution you'd like
But some reason the documentation to extend SQL with expression is quite hidden. I have browsed the documentation a couple of times before but didn't know this feature exist.
What version of ibis are you running?
ibis-framework 9.5.0
What backend(s) are you using, if any?
No response
Code of Conduct