Closed isichei closed 5 years ago
FIRST RUN =>
from etl_manager.meta import _get_spec, read_database_folder print(_get_spec('base'))
OUTPUT =>
{'Name': '', 'Description': '', 'Owner': 'owner', 'Retention': 0, 'StorageDescriptor': {'Columns': [], 'Location': '', 'InputFormat': '', 'OutputFormat': '', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': '', 'Parameters': {}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {}}
THEN RUN =>
db = read_database_folder('example/meta_data/db1/') glue_def_dump = db.table('pay').glue_table_definition() print(_get_spec('base'))
{'Name': '', 'Description': '', 'Owner': 'owner', 'Retention': 0, 'StorageDescriptor': {'Columns': [{'Name': 'employee_id', 'Comment': 'an ID for each employee', 'Type': 'int'}, {'Name': 'annual_salary', 'Comment': 'Annual salary', 'Type': 'float'}], 'Location': 's3://my-bucket/database/database1/pay/', 'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat', 'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat', 'Compressed': False, 'NumberOfBuckets': -1, 'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe', 'Parameters': {'field.delim': ','}}, 'BucketColumns': [], 'SortColumns': [], 'Parameters': {'classification': 'csv', 'delimiter': ',', 'skip.header.line.count': '1'}, 'StoredAsSubDirectories': False}, 'PartitionKeys': [], 'TableType': 'EXTERNAL_TABLE', 'Parameters': {'classification': 'csv', 'delimiter': ',', 'skip.header.line.count': '1'}}
base_spec gets overwritten after applying dict merge. Error is caused by not properly copying dictionary from _template here
_template
FIRST RUN =>
OUTPUT =>
THEN RUN =>
OUTPUT =>
base_spec gets overwritten after applying dict merge. Error is caused by not properly copying dictionary from
_template
here