uace-azmet / azmet-forecast-qa

Developing QA/QC routines for AZMet
0 stars 1 forks source link

Data handling #27

Closed Aariq closed 1 year ago

Aariq commented 1 year ago

Improves data store handling by 1) not running data reading/writing steps on remote workers. This can slow down targets because relevant data must be transferred to workers running targets and in the case of steps that read the data store it's all the parquet files, I think. 2) rely on arrow's database-like connections more rather than collect()ing data early into dataframes. When writing to a partitioned data store, only the most recent partition needs to be pulled into memory and then overwritten.