spe-uob / 2020-HealthcareLake

A reasonably secure data lake for healthcare analytics
MIT License
9 stars 5 forks source link

Terraform: DynamoDB->Glue->S3 #105

Closed joekendal closed 3 years ago

joekendal commented 3 years ago
  1. Create a Glue catalog and crawler for the DynamoDB

  2. Create a Job to ETL into Parquet format in S3 bucket

joekendal commented 3 years ago

Terraform modules

aws_glue_crawler aws_glue_job

aws_iam_role aws_iam_policy_document

joekendal commented 3 years ago

@LukeBenson21 possible to add the scripts bucket name as an output of the module and then output of the root module so that the README makes sense when it's referring to GLUE_SCRIPT variable. It will get printed in the terminal at the end. Chicken and egg problem like you described just makes it easier perhaps.