Open oscarsyu opened 2 months ago
This may need to be split up into a research issue and an implementation issue (first lays out exactly what is dirty and second lays out the code to fix it).
I also think we may need to build the ETL pipeline first... or maybe have the research half of this be part of building out the ETL pipeline? Otherwise there's just not anywhere to put this code...
Context
We are hoping to automatically ingest our datasets in from sources (when possible and appropriate). This task is to do data quality validation to identify existing issues, and handle any possible future ones. The owner of this task will be responsible for creating a process that validates and corrects errors so that a future automated call to the data source will result in usable and quality data to be displayed to users. Datasets are here
Additional Info here
Definition of Done
Engineering Details