Data Cleaning Pipeline - Githubissues

Context

We are hoping to automatically ingest our datasets in from sources (when possible and appropriate). This task is to do data quality validation to identify existing issues, and handle any possible future ones. The owner of this task will be responsible for creating a process that validates and corrects errors so that a future automated call to the data source will result in usable and quality data to be displayed to users. Datasets are here

Additional Info here

Definition of Done

identify what data quality issues we have and may have in our datasets
- ex: address mispellings
- missing geometries
- incorrectly spelled categorizations
write a script (or full fledged pipeline) that handles these errors and corrects them

Engineering Details

Please write this in Python

sfbrigade / datasci-earthquake

Data Cleaning Pipeline #12

Context

Definition of Done

Engineering Details