Closed shinichi-takii closed 4 years ago
Im getting a different error while parsing the about SQL query of RedShift. Seems its considering the Encoding part as data type.
* TABLE *
schema = sample_schema : name = sample_table : is_temp = False
* BigQuery Fields *
Traceback (most recent call last):
File "c:/Users/rbhuv/Desktop/code/testing.py", line 35, in <module>
print(table.to_bigquery_fields())
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 427, in to_bigquery_fields
return self._columns.to_bigquery_fields(name_case)
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 365, in to_bigquery_fields
bq_fields.append(col.to_bigquery_field(name_case))
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 292, in to_bigquery_field
type = self.bigquery_legacy_data_type
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 256, in bigquery_legacy_data_type
return self.bigquery_data_type
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 250, in bigquery_data_type
raise ValueError("Unknown data type : '{}'".format(self._data_type))
ValueError: Unknown data type : 'LZO'
from ddlparse import DdlParse
sample_ddl = """
CREATE TABLE IF NOT EXISTS "sample_schema"."sample_table"
(
"div_cd" VARCHAR(2) NOT NULL
,"div_name" VARCHAR(30) NOT NULL
,"org_cd" VARCHAR(8) NOT NULL
,"org_name" VARCHAR(60) NOT NULL
,"team_cd" VARCHAR(2) NOT NULL
,"team_name" VARCHAR(120) NOT NULL
,"personal_cd" VARCHAR(7) NOT NULL
,"personal_name" VARCHAR(300) NOT NULL
,"username" VARCHAR(6) NOT NULL
,"staff_flg" CHAR(1) DEFAULT '0'::bpchar ENCODE lzo
,"leader_flg" CHAR(1) DEFAULT '0'::bpchar ENCODE lzo
)
DISTSTYLE EVEN
;
"""
# parse pattern (1)
table = DdlParse().parse(sample_ddl)
# parse pattern (2)
parser = DdlParse()
parser.ddl = sample_ddl
table = parser.parse()
print("* TABLE *")
print("schema = {} : name = {} : is_temp = {}".format(table.schema, table.name, table.is_temp))
print("* BigQuery Fields *")
print(table.to_bigquery_fields())
print("* BigQuery Fields - column name to lower case / upper case *")
print(table.to_bigquery_fields(DdlParse.NAME_CASE.lower))
print(table.to_bigquery_fields(DdlParse.NAME_CASE.upper))
print("* COLUMN *")
for col in table.columns.values():
print("name = {} : data_type = {} : length = {} : precision(=length) = {} : scale = {} : constraint = {} : not_null = {} : PK = {} : unique = {} : BQ {}".format(
col.name,
col.data_type,
col.length,
col.precision,
col.scale,
col.constraint,
col.not_null,
col.primary_key,
col.unique,
col.to_bigquery_field()
))
print("* Get Column object (case insensitive) *")
print(table.columns["total"])
@BhuviTheDataGuy Thank you for reporting.
I fixed this issue.
Please upgrade the package to v1.6.1
.
$ pip --no-cache-dir install --upgrade ddlparse
Thanks
Awesome, it worked.
But the total class is giving an error now.
Traceback (most recent call last):
File "c:/Users/rbhuv/Desktop/code/testing.py", line 57, in <module>
print(table.columns["total"])
File "C:\Users\rbhuv\AppData\Roaming\Python\Python38\site-packages\ddlparse\ddlparse.py", line 337, in __getitem__
return super().__getitem__(key.lower())
KeyError: 'total'
Hello, @BhuviTheDataGuy
This statement calls the total
column.
But, your DDL does not have a total
column.
That is the cause of the error.
For example, this DDL should work.
CREATE TABLE IF NOT EXISTS "sample_schema"."sample_table"
(
"div_cd" VARCHAR(2) NOT NULL
,"div_name" VARCHAR(30) NOT NULL
,"org_cd" VARCHAR(8) NOT NULL
,"org_name" VARCHAR(60) NOT NULL
,"team_cd" VARCHAR(2) NOT NULL
,"team_name" VARCHAR(120) NOT NULL
,"personal_cd" VARCHAR(7) NOT NULL
,"personal_name" VARCHAR(300) NOT NULL
,"username" VARCHAR(6) NOT NULL
,"staff_flg" CHAR(1) DEFAULT '0'::bpchar ENCODE lzo
,"leader_flg" CHAR(1) DEFAULT '0'::bpchar ENCODE lzo
,"total" INTEGER
)
DISTSTYLE EVEN
;
Please check it.
Thank you.
The fix provided is not sufficient as I have CREATE TABLE
with column created_at timestamp without time zone DEFAULT ('now'::text)::timestamp without time zone ENCODE az64
and the parser splits the ::timestamp without time zone
as two new columns one without
with type time
and another zone
with type ENCODE
Example
DDL (Redshift)
ddlparse output BigQuery JSON schema
Not parse
leader_flg
field.