tuva-health / tuva

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
https://thetuvaproject.com/
172 stars 41 forks source link

2 fixes for Databricks compatibility #500

Closed donaldrauscher closed 1 month ago

donaldrauscher commented 2 months ago

Describe your changes

Please include a summary of any changes.

I added ltrim.sql macro which uses adapter.dispatch to pipe in different logic for different databases. Main reason is that Databricks' LTRIM function arguments are reversed!! https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/ltrim

I also updated normalized_input__int_bill_type_voting.sql to use this macro. Confirmed that this is the only model that uses ltrim today.

Lastly, I added some format options to databricks__load_seed which allow quotes to be escaped properly and allow for line breaks. This fixes load errors for clinical_concept_library__clinical_concepts and terminology__snomed_ct (also observed by Sean Cofoid)

How has this been tested?

Please describe the tests you ran to verify your changes. Provide instructions or code to reproduce output.

I performed a dbt compile on the above-mentioned model to make sure it was producing correctly formatted query.

For the seeds, I loaded seeds using the original logic and the new logic. And I confirmed that the only differences were in the above-two seeds. All other seeds had identical sizes.

image

Reviewer focus

Please summarize the specific items you’d like the reviewer(s) to look into.

Confirm if there are any adapters that have a similar issue

Checklist before requesting a review

Package release checklist

(Optional) Gif of how this PR makes you feel

Loom link