tobymao / sqlglot

Python SQL Parser and Transpiler
https://sqlglot.com/
MIT License
6.5k stars 671 forks source link

I have a problem parsing Doris's table creation ddl, Is 'doris' a legal dialect of sqlglot? #3040

Closed madeirak closed 7 months ago

madeirak commented 7 months ago

When I try to parse a doris table creation ddl , I meet this contains unsupported syntax. Falling back to parsing as a 'Command'., the result is not a number of parsed ast nodes, but a single node "Command"

Typical doris create table ddl is like below

CREATE TABLE [IF NOT EXISTS] [database.]table
(
    column_definition_list,
    [index_definition_list]
)
[engine_type]
[keys_type]
[table_comment]
[partition_info]
distribution_desc
[rollup_list]
[properties]
[extra_properties]
madeirak commented 7 months ago

the version of sqlglot is 22.0.1

madeirak commented 7 months ago

This is an example:

CREATE TABLE `xxx` (
            `id` varchar(255) NULL COMMENT 'id',
            `dt` date NULL COMMENT 'e.g. 2022-01-10',
            INDEX id_index (`duid`) USING BITMAP COMMENT 'id的bitmap索引',
            ) ENGINE=OLAP
            UNIQUE KEY(`id`)
            COMMENT 'this is comment'
            PARTITION BY RANGE(`dt`)
            (PARTITION p201101 VALUES [('2011-01-01'), ('2011-02-01')),
            PARTITION p201102 VALUES [('2011-02-01'), ('2011-03-01'))
            DISTRIBUTED BY HASH(`id`) BUCKETS 8
            PROPERTIES (
            "xxx" = "yyy"
            );
madeirak commented 7 months ago

I noticed things in the feature below that only support ‘SELECT’ for doris? https://github.com/tobymao/sqlglot/pull/2006

georgesittas commented 7 months ago

SQLGlot can parse CREATE statements, but some post-schema properties may be unsupported. For example, it currently handles a subset of your DDL just fine (notice how there's no "unsupported" syntax warning):

>>> sqlglot.transpile("""
... CREATE TABLE `xxx` (
...     `id` varchar(255) NULL COMMENT 'id',
...     `dt` date NULL COMMENT 'e.g. 2022-01-10',
...     INDEX id_index (`duid`) USING BITMAP COMMENT 'id的bitmap索引',
... )
... ENGINE=OLAP
... COMMENT 'this is comment'
... PARTITION BY RANGE(`dt`)
... """, "doris")
["CREATE TABLE `xxx` (`id` VARCHAR(255) NULL COMMENT 'id', `dt` DATE NULL COMMENT 'e.g. 2022-01-10', INDEX id_index (`duid`) USING BITMAP COMMENT 'id的bitmap索引') ENGINE=OLAP COMMENT='this is comment' WITH (PARTITIONED_BY=RANGE(`dt`))"]

When there's a missing property, SQLGlot falls back to producing a Command that contains the DDL as a raw string, i.e. chooses not to parse it altogether instead of failing.

Adding support for all of the missing properties is out of scope for now, but we'll be happy to accept a well-crafted PR. If you're interested in contributing, please reach out in Slack to make sure we're aligned on the approach.