tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.7k stars 701 forks source link

Failure to parse valid Postgres query #4133

Closed danxmoran closed 1 month ago

danxmoran commented 1 month ago

Fully reproducible code snippet

As a script:

#!/usr/bin/env python
import sqlglot

query = """
SELECT DISTINCT job.id FROM job LEFT OUTER JOIN LATERAL (SELECT 1 FROM ((VALUES (1)) AS dummy_table_ignore_me_1 LEFT OUTER JOIN (application AS app_for_count_1 INNER JOIN interview_stage AS interview_stage_1 ON app_for_count_1.current_interview_stage_id = interview_stage_1.id AND interview_stage_1.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' AND interview_stage_1.stage_type <> 'PreInterviewScreen') ON job.id = app_for_count_1.job_id AND app_for_count_1.status = 'active' AND
app_for_count_1.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608') WHERE job.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' GROUP BY job.id HAVING count(app_for_count_1.id) >= 0 AND count(app_for_count_1.id) < 1) AS unoptimizable_query_1 ON (job.id = job.id ) LEFT OUTER JOIN LATERAL (SELECT 1 FROM ((VALUES (1)) AS dummy_table_ignore_me_2 LEFT OUTER JOIN (application AS app_for_count_2 INNER JOIN interview_stage AS interview_stage_2 ON
app_for_count_2.current_interview_stage_id = interview_stage_2.id AND interview_stage_2.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' AND interview_stage_2.stage_type <> 'PreInterviewScreen') ON job.id = app_for_count_2.job_id AND app_for_count_2.status = 'active' AND app_for_count_2.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608') WHERE job.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' GROUP BY job.id HAVING count(app_for_count_2.id) >= 1) AS
unoptimizable_query_2 ON (job.id = job.id ) LEFT OUTER JOIN LATERAL (SELECT 1 FROM ((VALUES (1)) AS dummy_table_ignore_me_3 LEFT OUTER JOIN (application AS app_for_count_3 INNER JOIN interview_stage AS interview_stage_3 ON app_for_count_3.current_interview_stage_id = interview_stage_3.id AND interview_stage_3.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' AND interview_stage_3.stage_type <> 'PreInterviewScreen') ON job.id = app_for_count_3.job_id AND
app_for_count_3.status = 'active' AND app_for_count_3.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608') WHERE job.organization_id = '40e6215d-b5c6-4896-987c-f30f3678f608' GROUP BY job.id HAVING count(app_for_count_3.id) <= 1) AS unoptimizable_query_3 ON (job.id = job.id ) WHERE (( unoptimizable_query_1.* IS DISTINCT FROM NULL  ) OR ( unoptimizable_query_2.* IS DISTINCT FROM NULL  )) OR ( unoptimizable_query_3.* IS DISTINCT FROM NULL  );
"""

sqlglot.parse_one(query, read="postgres")

(The weird SQL is from a dashboard generator)

Running the above raises an exception:

Traceback (most recent call last):
  File "/Users/dan/app/backend/./repro.py", line 12, in <module>
    sqlglot.parse_one(query, read="postgres")
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/__init__.py", line 138, in parse_one
    result = dialect.parse(sql, **opts)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/dialects/dialect.py", line 919, in parse
    return self.parser(**opts).parse(self.tokenize(sql), sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 1388, in parse
    return self._parse(
           ^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 1457, in _parse
    expressions.append(parse_method(self))
                       ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 1695, in _parse_statement
    expression = self._parse_set_operations(expression) if expression else self._parse_select()
                                                                           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 2971, in _parse_select
    this = self._parse_query_modifiers(this)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3124, in _parse_query_modifiers
    for join in self._parse_joins():
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3380, in _parse_join
    kwargs: t.Dict[str, t.Any] = {"this": self._parse_table(parse_bracket=parse_bracket)}
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3605, in _parse_table
    lateral = self._parse_lateral()
              ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3299, in _parse_lateral
    this = self._parse_select(table=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 2981, in _parse_select
    self._parse_table()
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3617, in _parse_table
    subquery = self._parse_select(table=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 2966, in _parse_select
    from_ = self._parse_from()
            ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3191, in _parse_from
    exp.From, comments=self._prev_comments, this=self._parse_table(joins=joins)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 3617, in _parse_table
    subquery = self._parse_select(table=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 2987, in _parse_select
    self._match_r_paren()
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 7149, in _match_r_paren
    self.raise_error("Expecting )")
  File "/opt/homebrew/lib/python3.11/site-packages/sqlglot/parser.py", line 1501, in raise_error
    raise error
sqlglot.errors.ParseError: Expecting ). Line 2, Col: 116.
  NCT job.id FROM job LEFT OUTER JOIN LATERAL (SELECT 1 FROM ((VALUES (1)) AS dummy_table_ignore_me_1 LEFT OUTER JOIN (application AS app_for_count_1 INNER JOIN interview_stage AS interview_stage_1 ON app_f

I'm able to run the query locally using psql against a postgres:15.5 container, so I believe it is valid.

danxmoran commented 1 month ago

I get the same error from a simplified query:

SELECT 1 FROM (
  (VALUES (1)) AS vals (id)
  LEFT OUTER JOIN tbl
  ON vals.id = tbl.id
)

If I delete the outer-most parentheses, parsing succeeds