hackforla / 311-data

Empowering Neighborhood Associations to improve the analysis of their initiatives using 311 data
https://hackforla.github.io/311-data/
GNU General Public License v3.0
62 stars 64 forks source link

convert ingest table to clean data table #210

Closed gennaer closed 4 years ago

gennaer commented 4 years ago

Overview

Read from sql_ingest_table and write to new clean_data table.

Action Items

Resources/Instructions

See https://github.com/ryanmswan/311-data/blob/dev/Documentation/sqlIngest_documentation.ipynb for documentation about the sql ingest script

ryanmswan commented 4 years ago

Created DataCleaner object methods for creating new clean_data table in PR #227

ExperimentsInHonesty commented 4 years ago

@ryanmswan can you provide a few more details...

  1. Progress
  2. Blocks
  3. Availability
  4. ETA
ryanmswan commented 4 years ago
  1. We have a data cleaning object that can pull data from the database and write it back. I'd say we're probably at about 20% here.

  2. Next steps are to add cleaning functionality to create new columns and decide for which tasks it's better to do operations in the database per #213 and which require operations to be done after pulling the data from the database.

  3. This is my primary issue to work on for the next week. @gennaer is actively on this issue as well.

  4. Best guess ETA based on prior progress is March 1.

ExperimentsInHonesty commented 4 years ago

When this issue is done, please move #171 and #213 from icebox to prioritized backlog

ExperimentsInHonesty commented 4 years ago

@ryanmswan

  1. Progress Attacking from the ground up, figuring out which rule got binned.
  2. Blocks - none, just a large issue
  3. Availability: 8-10
  4. ETA 3/2

Recommendation to break into several issue

ryanmswan commented 4 years ago

Updated to break location binning into new issue #239

johnr54321 commented 4 years ago

@ryanmswan will have a basic version of this done by 2/18 if Genna hasn't contributed before then.

johnr54321 commented 4 years ago

ETA 3/10.

johnr54321 commented 4 years ago

We had backend issues. Ryan and Russell are chatting about how to do this given the 10M row limit. Goal will be to do this adaptively using sql injest or on it´s own.

ETA: Tuesday 3/31.

johnr54321 commented 4 years ago

New ETA based on potential issues with Pandas. ETA 4/7

jmensch1 commented 4 years ago

We're now cleaning data in a separate table and copying it to the requests table. So closing this one.