tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.51k stars 671 forks source link

How to validate SQL query with unbalanced quotes #1930

Closed vmathurpaytm closed 1 year ago

vmathurpaytm commented 1 year ago

We want to parse the mysql query in python project and detect wrong query based on unbalanced quotes or wrong quotes placement or column filtered values having wrong quotes

Example:

"select from city where name='x and type=y;" -> Wrong query "select from city where name='x and type=y';" -> Wrong query

We tried using sqlparse and sqlglot and sqlvalidator

from sqlglot import exp, parse_one
try:
     sqlglot.transpile("select * from city where name='x and type=y';")
except sqlglot.errors.ParseError as e:
     print(e.errors)
georgesittas commented 1 year ago

Regarding the first query, SQLGlot raises a RuntimeError:

>>> import sqlglot
>>> sqlglot.parse_one("select * from city where name='x and type=y")
Traceback (most recent call last):
  File "sqlglot/tokens.py", line 798, in tokenize
    self._scan()
  File "sqlglot/tokens.py", line 821, in _scan
    self._scan_keywords()
  File "sqlglot/tokens.py", line 957, in _scan_keywords
    if self._scan_string(word):
  File "sqlglot/tokens.py", line 1094, in _scan_string
    text = self._extract_string(end)
  File "sqlglot/tokens.py", line 1151, in _extract_string
    raise RuntimeError(f"Missing {delimiter} from {self._line}:{self._start}")
RuntimeError: Missing ' from 1:30

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sqlglot/__init__.py", line 122, in parse_one
    result = dialect.parse(sql, **opts)
  File "sqlglot/dialects/dialect.py", line 278, in parse
    return self.parser(**opts).parse(self.tokenize(sql), sql)
  File "sqlglot/dialects/dialect.py", line 292, in tokenize
    return self.tokenizer.tokenize(sql)
  File "sqlglot/tokens.py", line 803, in tokenize
    raise ValueError(f"Error tokenizing '{context}'") from e
ValueError: Error tokenizing 'select * from city where name='x and type='

Perhaps this needs to be renamed into TokenizeError, cc: @tobymao?

Your second query is a perfectly valid one, which is why SQLGlot and (I assume) the other parsers treat it as such. If for some reason you want to detect the pattern you mention and flag the query as invalid, you need to implement custom code in your application to do it.


Also, please note that SQLGlot is not supposed to be a validator, so besides these tokenizer errors and some simple parser errors, it might still parse invalid queries successfully.