ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.28k stars 595 forks source link

bug: sql parsing cannot handle TPC-H #9979

Open EpsilonPrime opened 2 months ago

EpsilonPrime commented 2 months ago

What happened?

ibis.expr.sql.parse_sql raises an exception for any reasonable SQL query

What version of ibis are you using?

9.3.0

What backend(s) are you using, if any?

None

Relevant log output

import ibis
import pyarrow.parquet as pq
from ibis.expr.sql import parse_sql
from ibis_substrait.compiler.core import SubstraitCompiler

def main():
    sql = "SELECT COUNT(*) FROM customer"
    table_schemas: dict[str, ibis.Schema] = {}

    r = pq.read_table(
        '/$HOME/projects/data/tpch/parquet/customer.parquet')
    table_schemas['customer'] = ibis.Schema.from_pyarrow(r.schema)

    expr = parse_sql(sql, catalog=table_schemas)
    compiler = SubstraitCompiler()
    return compiler.compile(expr)

if __name__ == "__main__":
    main()

ibis-framework            9.3.0
ibis-substrait            4.0.1
pyarrow                   17.0.0                   
pyarrow-hotfix            0.6

Code of Conduct

gforsyth commented 2 months ago

Hey @EpsilonPrime -- there should be some small improvements in the just-released 9.4.0. I'm going to close this out as a duplicate of #9529

EpsilonPrime commented 2 months ago

While the 9.4.0 release handles 6 more TPC-H queries than the 9.3.0 release, it is far from complete. Converting SQL to an IBIS expression (not Ibis code) is still not possible in the 9.x series of Ibis.

jcrist commented 1 month ago

Since this is well scoped to handling only TPC-H queries, and is just focused on the parse_sql functionality (not the rest of the decompiler), I'm going to reopen this.