moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 9 forks source link

Fix issue 119 #121

Closed RobinL closed 4 years ago

RobinL commented 4 years ago

Closes #119

RobinL commented 4 years ago

Actually the problem mentioned by #119 is much deeper.

The mock test doesn't actually work - the database and table is created in Athena but the Athena database is not actually queryable. You can verify this by creating a file called e.g. delete.py in the etl_manager folder containing:

from etl_manager.meta import read_database_folder
import os
db = read_database_folder(os.path.join(os.path.dirname(__file__), "tests/data/data_types/"))
db.create_glue_database(delete_if_exists=True)

and running python delete.py.

The database will be created, but if you go into athena and query it, you'll get an error.

If you pull this branch, this will now work!

The problem is that the <> bit is stripped out. Running this for real results in the following ddl:

SHOW CREATE TABLE test_table;

 Error: < expected at the position 9 of 'int:array:array:struct:struct' but ':' is found.
RobinL commented 4 years ago

Karik - i suggest you look at this/think about it somewhat carefully before merging.

I think what i've done is sensible, but it's quite complex...so i might have got something wrong!!

RobinL commented 4 years ago

I'm happier with this now with the latest push. The user now always specifies all types using the agnostic format (e.g. character rather than string) and etl_manager will convert agnostic to the types required by Athena/glue