moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 8 forks source link

base spec is getting overwritten #80

Closed isichei closed 5 years ago

isichei commented 5 years ago

FIRST RUN =>

from etl_manager.meta import _get_spec, read_database_folder
print(_get_spec('base'))

OUTPUT =>

{'Name': '',
 'Description': '',
 'Owner': 'owner',
 'Retention': 0,
 'StorageDescriptor': {'Columns': [],
  'Location': '',
  'InputFormat': '',
  'OutputFormat': '',
  'Compressed': False,
  'NumberOfBuckets': -1,
  'SerdeInfo': {'SerializationLibrary': '', 'Parameters': {}},
  'BucketColumns': [],
  'SortColumns': [],
  'Parameters': {},
  'StoredAsSubDirectories': False},
 'PartitionKeys': [],
 'TableType': 'EXTERNAL_TABLE',
 'Parameters': {}}

THEN RUN =>

db = read_database_folder('example/meta_data/db1/')
glue_def_dump = db.table('pay').glue_table_definition()
print(_get_spec('base'))

OUTPUT =>

{'Name': '',
 'Description': '',
 'Owner': 'owner',
 'Retention': 0,
 'StorageDescriptor': {'Columns': [{'Name': 'employee_id',
    'Comment': 'an ID for each employee',
    'Type': 'int'},
   {'Name': 'annual_salary', 'Comment': 'Annual salary', 'Type': 'float'}],
  'Location': 's3://my-bucket/database/database1/pay/',
  'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat',
  'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
  'Compressed': False,
  'NumberOfBuckets': -1,
  'SerdeInfo': {'SerializationLibrary': 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe',
   'Parameters': {'field.delim': ','}},
  'BucketColumns': [],
  'SortColumns': [],
  'Parameters': {'classification': 'csv',
   'delimiter': ',',
   'skip.header.line.count': '1'},
  'StoredAsSubDirectories': False},
 'PartitionKeys': [],
 'TableType': 'EXTERNAL_TABLE',
 'Parameters': {'classification': 'csv',
  'delimiter': ',',
  'skip.header.line.count': '1'}}

base_spec gets overwritten after applying dict merge. Error is caused by not properly copying dictionary from _template here