tuva-health / tuva

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
https://thetuvaproject.com/
189 stars 49 forks source link

Postgres implementation of load_seed macro #549

Closed justinmills closed 3 months ago

justinmills commented 3 months ago

Describe your changes

Implements a AWS RDS-centric implementation of a postgres load_seed macro implementation.

This relies on two optional variables set in your dbt project to override the S3 bucket name and optionally provide a prefix.

This assumes you've created a pg-friendly format of the seed data (this repo can be used to generate one). This macro implementation also requires that you have setup your RDS cluster/instance with a IAM Role that has the proper privileges to access the S3 bucket where the seed data is stored.

Also fixed PR template style guide link.

How has this been tested?

We've run this a few times using dbt cloud in individual and our production environments. We have not however done exhaustive testing around data quality.

Reviewer focus

This implementation is AWS+RDS specific, so it is unlikely to work for a postgres instance that is not hosted via AWS' RDS offering. There may be other pg extensions to read data from S3, but those have not been explored or tested.

Checklist before requesting a review

Package release checklist

(Optional) Gif of how this PR makes you feel

Loom link

justinmills commented 3 months ago

We have yet to test this change out against the mainline tuva, but we have tested it against a fork of the v0.8.6 branch (with some other pg-specific fixes backported/applied - none of which I believe are required on main).