xnuinside / simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
MIT License
172 stars 39 forks source link

The parser fails to parse columns with space in them #238

Open soujyosen opened 4 months ago

soujyosen commented 4 months ago

DDL: CREATE EXTERNAL TABLE IF NOT EXISTS competitor_analytics.wtp_weekly_unique_hirer_by_billing_paid_budget_sheet_target ( week ending string, dbhk string,
jsmy string,
jssg string,
jsid string,
jsph string,
dbth string,
total string ) COMMENT 'Active CSV target benchmarking data uploads for Weekly Trading Pack. Updated on a yearly and need-based frequency. Sourced from Microsoft Shared Folder and JobSeeker Dashboard.' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION 's3://demo/path' TBLPROPERTIES ("skip.header.line.count" = "1");

Parser output:
[{'table_name': 'wtp_weekly_unique_hirer_by_billing_paid_budget_sheet_target', 'schema': 'competitor_analytics', 'primary_key': [], 'columns': [{'name': '`week', 'type': 'ending` string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'dbhk', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'jsmy', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'jssg', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'jsid', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'jsph', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'dbth', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}, {'name': 'total', 'type': 'string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}], 'alter': {}, 'checks': [], 'index': [], 'partitioned_by': [], 'tablespace': None, 'if_not_exists': True, 'comment': "'Active CSV target benchmarking data uploads for Weekly Trading Pack. Updated on a yearly and need-based frequency. Sourced from Microsoft Shared Folder and JobSeeker Dashboard.'", 'stored_as': None, 'row_format': {'serde': True, 'java_class': "'org.apache.hadoop.hive.serde2.OpenCSVSerde'"}, 'location': "'s3://seek-dataplatform-dropin-raw-prod/external/Competitor-Analytics/z_competitor_analytics/wtp_weekly_unique_hirer_by_billing_paid_budget_sheet_target/'", 'fields_terminated_by': None, 'lines_terminated_by': None, 'map_keys_terminated_by': None, 'collection_items_terminated_by': None, 'property_key': None}]

The first column with a space has issues in parsing {'name': 'week', 'type': 'ending string', 'size': None, 'references': None, 'unique': False, 'nullable': True, 'default': None, 'check': None}

xnuinside commented 2 months ago

@soujyosen hi, thanks for opening the issue, can you provide link to the documentation - what DB is supports creating column names without quotes and with spaces in names?