smart-on-fhir / cumulus-library

https://docs.smarthealthit.org/cumulus/library/
Apache License 2.0
2 stars 0 forks source link

Enhancement: table create date for Athena tables? #162

Closed comorbidity closed 6 months ago

comorbidity commented 6 months ago

Not sure if this is possible, but it would be nice to know when a table was created. Is it possible to automatically bake a MM-DD-YYYY type prefix into the tablename created in S3?

CREATE EXTERNAL TABLE covid_symptom__prevalence_ed( author_week date, author_month date, gender string, race_display array, age_at_visit bigint, age_group varchar(5), variant_era varchar(12), enc_class_display varchar(21), subject_ref string, encounter_ref string, covid_pcr_result varchar(8), covid_icd10 string, covid_dx boolean, covid_symptom varchar(24), symptom_icd10_display string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3://avillach-v2-bch-pl-lzprod-leap-dev-etl/athena-results/tables/8796aa2a-eefd-46af-87bd-2214225dcc99' TBLPROPERTIES ( 'auto.purge'='false', 'has_encrypted_data'='false', 'numFiles'='-1', 'parquet.compression'='GZIP', 'presto_query_id'='20230929_145431_00040_fw37t', 'presto_version'='0.215-19159-g08fc90f', 'totalSize'='-1', 'transactional'='false')

comorbidity commented 6 months ago

@dogversioning if this is a big ask, no worries, it would just make my life slightly easier when trying to remember metadata of when study tables were created. Since I do things that may be are "out of band" from your normal "build library" process, this may not the the right solution. I'm all ears what a good solution would be , even if that is "Andy follow best practice".

comorbidity commented 6 months ago

OK, easy enough with a few extra steps to Login CLI and do this https://stackoverflow.com/questions/62196421/determine-the-table-creation-date-in-aws-athena-using-information-schema-catalog

dogversioning commented 6 months ago

Note that we have a transaction log per study now, so you can see when studies were last built, which I think covers the spirit of this ask. I haven't made a global version yet, but you can find it, going forward, in a table called [study_prefix]__lib_transactions.

I'm going to close this as a result of this already being present. There are also a lot of structural reusability reasons I would prefer not to do this as written - if you really want it this way, feel free to reopen and we can talk about it further