ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.19k stars 590 forks source link

rlike works but ibis.show_sql produces error in duckdb, but when pandas is used neither works. #5807

Closed ededovic closed 1 year ago

ededovic commented 1 year ago
import ibis
from ibis.interactive import *
t = ex.penguins.fetch()
res = t.filter(_.island.rlike('Biscoe|Dream'))

ibis.show_sql(res) creates the below error:

---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 ibis.show_sql(res)

File /miniconda3/envs/ml310/lib/python3.10/site-packages/ibis/expr/sql.py:324, in show_sql(expr, dialect, file)
    288 @public
    289 def show_sql(
    290     expr: ir.Expr,
    291     dialect: str | None = None,
    292     file: IO[str] | None = None,
    293 ) -> None:
    294     """Pretty-print the compiled SQL string of an expression.
    295 
    296     If a dialect cannot be inferred and one was not passed, duckdb
   (...)
    322     FROM t AS t0
    323     """
--> 324     print(to_sql(expr, dialect=dialect), file=file)

File /miniconda3/envs/ml310/lib/python3.10/site-packages/ibis/expr/sql.py:388, in to_sql(expr, dialect, **kwargs)
    385         read = write = getattr(backend, "_sqlglot_dialect", dialect)
    387 sql = backend._to_sql(expr, **kwargs)
--> 388 (pretty,) = sg.transpile(sql, read=read, write=write, pretty=True)
    389 return SQLString(pretty)

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/__init__.py:184, in transpile(sql, read, write, identity, error_level, **opts)
    165 """
    166 Parses the given SQL string in accordance with the source dialect and returns a list of SQL strings transformed
    167 to conform to the target dialect. Each string in the returned list represents a single transformed SQL statement.
   (...)
    179     The list of transpiled SQL statements.
    180 """
    181 write = write or read if identity else write
    182 return [
    183     Dialect.get_or_raise(write)().generate(expression, **opts)
--> 184     for expression in parse(sql, read, error_level=error_level)
    185 ]

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/__init__.py:72, in parse(sql, read, **opts)
     60 """
     61 Parses the given SQL string into a collection of syntax trees, one per parsed SQL statement.
     62 
   (...)
     69     The resulting syntax tree collection.
     70 """
     71 dialect = Dialect.get_or_raise(read)()
---> 72 return dialect.parse(sql, **opts)

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/dialects/dialect.py:163, in Dialect.parse(self, sql, **opts)
    162 def parse(self, sql: str, **opts) -> t.List[t.Optional[exp.Expression]]:
--> 163     return self.parser(**opts).parse(self.tokenize(sql), sql)

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/parser.py:785, in Parser.parse(self, raw_tokens, sql)
    771 def parse(
    772     self, raw_tokens: t.List[Token], sql: t.Optional[str] = None
    773 ) -> t.List[t.Optional[exp.Expression]]:
    774     """
    775     Parses a list of tokens and returns a list of syntax trees, one tree
    776     per parsed SQL statement.
   (...)
    783         The list of syntax trees.
    784     """
--> 785     return self._parse(
    786         parse_method=self.__class__._parse_statement, raw_tokens=raw_tokens, sql=sql
    787     )

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/parser.py:851, in Parser._parse(self, parse_method, raw_tokens, sql)
    848     expressions.append(parse_method(self))
    850     if self._index < len(self._tokens):
--> 851         self.raise_error("Invalid expression / Unexpected token")
    853     self.check_errors()
    855 return expressions

File /miniconda3/envs/ml310/lib/python3.10/site-packages/sqlglot/parser.py:894, in Parser.raise_error(self, message, token)
    882 error = ParseError.new(
    883     f"{message}. Line {token.line}, Col: {token.col}.\n"
    884     f"  {start_context}\033[4m{highlight}\033[0m{end_context}",
   (...)
    890     end_context=end_context,
    891 )
    893 if self.error_level == ErrorLevel.IMMEDIATE:
--> 894     raise error
    896 self.errors.append(error)

ParseError: Invalid expression / Unexpected token. Line 3, Col: 17.
   t0.flipper_length_mm, t0.body_mass_g, t0.sex, t0.year 
FROM ibis_read_csv_0 AS t0 
WHERE t0.island ~ 'Biscoe|Dream'
cpcloud commented 1 year ago

Welcome @ededovic 👋🏻! Thanks for opening an issue.

The ParseError is coming from sqlglot. I'll report that upstream; bugs are often fixed there on the same day they are reported.

As for the pandas backend, do you have some failing example code?