uace-azmet / azmet-forecast-qa

Developing QA/QC routines for AZMet
0 stars 1 forks source link

Use partitioned dataset to store and update API data for modelling #13

Closed Aariq closed 1 year ago

Aariq commented 1 year ago

Rearranged some code so that when the date changes the most recent chunk of data is updated, not the entire dataset. Rather than keeping the up-to-date daily dataset as a target called daily, the data is written to disk using arrow::write_dataset(). The data is partitioned by year so that when the pipeline pulls down new data from the API, only a file holding the most recent year of data is overwritten. This idea could be extended to partition the data by yearmonth to do even less overwriting.

I've also added documentation to some of the functions, and renamed targets

Screenshot 2022-11-09 at 5 47 54 PM