spe-uob / 2020-HealthcareLake

A reasonably secure data lake for healthcare analytics
MIT License
9 stars 5 forks source link

Data Pipeline #102

Closed joekendal closed 3 years ago

joekendal commented 3 years ago

What we are building in the workshop on Friday is the following transformation from our DynamoDB to an S3 bucket in Apache Parquet format with Hive metastore data in Glue Data Catalog.

Untitled1

joekendal commented 3 years ago

Here you will find manual instructions using the AWS Console. We will first copy this to see if it works and once complete then it is just a matter of coding that in HCL / Terraform.

Terraform resources: glue_crawler glue_catalog_table glue_trigger