xnuinside / simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
MIT License
179 stars 40 forks source link

parser skips tables containing indexes with > 2 included columns #180

Closed mrmonkington closed 3 months ago

mrmonkington commented 1 year ago

Describe the bug

simple-ddl-parser==0.29.0

The parser seems to ignore any table with a multicolumn index where the number of included cols > 2.

To Reproduce

Sample program test.py:

import sys
from simple_ddl_parser import DDLParser  # type: ignore
from pprint import pprint

parse_results = DDLParser(sys.stdin.read()).run()
pprint(parse_results)

Input file input.sql:

CREATE TABLE a1 (
  b int,
  c int,
  d int,
  KEY e(b,c)
);
CREATE TABLE a2 (
  b int,
  c int,
  d int,
  KEY e(b,c,d)
);
cat input.sql | python test.py

Outputs:

[{'alter': {},
  'checks': [],
  'columns': [{'check': None,
               'default': None,
               'name': 'b',
               'nullable': True,
               'references': None,
               'size': None,
               'type': 'int',
               'unique': False},
              {'check': None,
               'default': None,
               'name': 'c',
               'nullable': True,
               'references': None,
               'size': None,
               'type': 'int',
               'unique': False},
              {'check': None,
               'default': None,
               'name': 'd',
               'nullable': True,
               'references': None,
               'size': None,
               'type': 'int',
               'unique': False},
              {'check': None,
               'default': None,
               'name': 'KEY',
               'nullable': True,
               'references': None,
               'size': None,
               'type': 'e',
               'unique': False}],
  'constraints': {'checks': None, 'references': None, 'uniques': None},
  'index': [],
  'partitioned_by': [],
  'primary_key': [],
  'schema': None,
  'table_name': 'a1',
  'tablespace': None}]

Note that a2 is not present.

Expected behavior

Two entries in the tables list, e.g.:

[{...
  'table_name': 'a1',
  ...},
{...
  'table_name': 'a2',
  ...},
]

Environment:

xnuinside commented 1 year ago

@mrmonkington hi, thanks for reporting the issue, I think problem in statement KEY, because it is usually PRIMARY KEY (https://www.w3schools.com/sql/sql_primarykey.ASP ). Can you share a doc for DB where is KEY used? Without PRIMARY

in output you can see, that KEY was recognized as a column, not a valid statement {'check': None, 'default': None, 'name': 'KEY',

mrmonkington commented 1 year ago

Hi @xnuinside

Yes I just noticed myself that the parser was interpreting it a regular column of type e - apologies!

However MySQL allows KEY as a synonym of INDEX and for some reason all of the schemas I'm parsing with your library use this form :(

https://dev.mysql.com/doc/refman/8.0/en/create-table.html#:~:text=KEY%20%7C%20INDEX,other%20database%20systems.

create_definition: {
    col_name column_definition
  | {INDEX | KEY} [index_name] [index_type] (key_part,...)
      [index_option] ...
  | {FULLTEXT | SPATIAL} [INDEX | KEY] [index_name] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] PRIMARY KEY
      [index_type] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] UNIQUE [INDEX | KEY]
      [index_name] [index_type] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] FOREIGN KEY
      [index_name] (col_name,...)
      reference_definition
  | check_constraint_definition
}

Hope that helps!

xnuinside commented 1 year ago

got it, I will add support for this statement

mrmonkington commented 1 year ago

@xnuinside I have noticed that neither KEY nor INDEX is supported in TABLE definitions (i.e. no inline indexes as supported by mySQL and I believe recent-ish msSQL) so I guess this is more work than simply adding a token synonym to the lexer.

I would really like to help implement this, but I am unsure what the output should look like for the purposes of writing some tests.

Please can you confirm for me that it would be correct if an inline index of form:

CREATE TABLE t1 (
  val INT,
  INDEX idx1(val);

...would output the same parsed form as:

CREATE TABLE t1 (
  val INT,
);
CREATE INDEX idx1 ON t1(val);

i.e.

[{'alter': {},
  'checks': [],
  'columns': [{'check': None,
               'default': None,
               'name': 'val',
               'nullable': True,
               'references': None,
               'size': None,
               'type': 'INT',
               'unique': False}],
  'index': [{'columns': ['val'],
             'detailed_columns': [{'name': 'val',
                                   'nulls': 'LAST',
                                   'order': 'ASC'}],
             'index_name': 'idx1',
             'unique': False}],
  'partitioned_by': [],
  'primary_key': [],
  'schema': None,
  'table_name': 't1',
  'tablespace': None}]

?

Thank you!

cfhowes commented 7 months ago

@mrmonkington Can you check if this now works after https://github.com/xnuinside/simple-ddl-parser/pull/219 was merged (0.31.3 and later)? I was using MySQL as well and patched up some key parsing. not sure if i fully got your cases though.

xnuinside commented 3 months ago

everything works (all samples from issue) on 1.3.1 version, tested, I will close the issues, if needed something else - feel free to open