Use partitioned dataset to store and update API data for modelling

Rearranged some code so that when the date changes the most recent chunk of data is updated, not the entire dataset. Rather than keeping the up-to-date daily dataset as a target called daily, the data is written to disk using arrow::write_dataset(). The data is partitioned by year so that when the pipeline pulls down new data from the API, only a file holding the most recent year of data is overwritten. This idea could be extended to partition the data by yearmonth to do even less overwriting.

I've also added documentation to some of the functions, and renamed targets

Screenshot 2022-11-09 at 5 47 54 PM

legacy_ now refers to historical data scraped from the AZMet website, not available through the API
past_ refers to data through october 2022. Joining the legacy data to some more recent data was my way of ensuring the old and new data are harmonized
db_daily targets are just pointers to /data/daily/ where the paritioned dataset gets written to

uace-azmet / azmet-forecast-qa

Use partitioned dataset to store and update API data for modelling #13