tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
5.57k stars 549 forks source link

sqlglot cannot recognize/parse column name which starts with a number #3336

Closed xinglin-zhao closed 1 week ago

xinglin-zhao commented 1 week ago

Fully reproducible code snippet The below is one example, expression_tree.select("3b") is parsed as 3 AS b, which is not intended. Please help fix or add checks such as not allowing column names starting with a number. For our use case, names like 3b are valid. image

This is the sqlglot version we are using Successfully installed sqlglot-23.11.2

tobymao commented 1 week ago

can you show documentation that allows 3b as a column name without quotes?

tobymao commented 1 week ago

you need to specify the dialect select("3b", dialect="databricks").

this works as expected

xinglin-zhao commented 1 week ago

Thanks Toby. Two questions:

  1. why select("3b") doesn't work? so the default dialect doesn't like starting numbers?
  2. if we need to add dialect="databricks" on every statement, it seems a bit verbose, any suggestions?
tobymao commented 1 week ago

right, the default dialect treats 3b like 3 as b. i suggest writing the default dialect of using the builder

xinglin-zhao commented 1 week ago

right, the default dialect treats 3b like 3 as b. i suggest writing the default dialect of using the builder

do you have any example of setting default dialect of using the builder. often our style starts with an empty Select expression, but setting dialect at Select doesn't work as default.

from sqlglot import exp, parse_one, expressions

query = expressions.Select(dialect='databricks')
query = query.select("3b")
query = query.select("3b", dialect='databricks')
query.sql(dialect="databricks")

result: 'SELECT 3 AS b, 3b'

tobymao commented 1 week ago

no that isn’t supported

xinglin-zhao commented 1 week ago

i see. if you have some good examples for this type of issue, please let me know. I can circulate them and make sure we are using sqlglot in a preferred way.

xinglin-zhao commented 1 week ago

the fix is something like, basically builds an expression. i think one learning is we should always use expression if possible.

column = expressions.Identifier(this="3b")
query = query.select(column)