Closed gennaer closed 4 years ago
Created DataCleaner object methods for creating new clean_data table in PR #227
@ryanmswan can you provide a few more details...
We have a data cleaning object that can pull data from the database and write it back. I'd say we're probably at about 20% here.
Next steps are to add cleaning functionality to create new columns and decide for which tasks it's better to do operations in the database per #213 and which require operations to be done after pulling the data from the database.
This is my primary issue to work on for the next week. @gennaer is actively on this issue as well.
Best guess ETA based on prior progress is March 1.
When this issue is done, please move #171 and #213 from icebox to prioritized backlog
@ryanmswan
Recommendation to break into several issue
Updated to break location binning into new issue #239
@ryanmswan will have a basic version of this done by 2/18 if Genna hasn't contributed before then.
ETA 3/10.
We had backend issues. Ryan and Russell are chatting about how to do this given the 10M row limit. Goal will be to do this adaptively using sql injest or on it´s own.
ETA: Tuesday 3/31.
New ETA based on potential issues with Pandas. ETA 4/7
We're now cleaning data in a separate table and copying it to the requests table. So closing this one.
Overview
Read from sql_ingest_table and write to new clean_data table.
Action Items
Resources/Instructions
See https://github.com/ryanmswan/311-data/blob/dev/Documentation/sqlIngest_documentation.ipynb for documentation about the sql ingest script