Create data cleaning pipeline in AWS

hackforla / lucky-parking

Visualization of parking data to assist in understanding of the effects of parking policies on a neighborhood by neighborhood basis in the City of Los Angeles

https://www.hackforla.org/projects/lucky-parking.html

33 stars 61 forks source link

Create data cleaning pipeline in AWS #149

Open gregpawin opened 4 years ago

gregpawin commented 4 years ago

Overview

We need to create a data cleaning pipeline that takes in raw input data from the Socrata API and updates the AWS database with the correctly formatted geospatial data

Action items

[x] Create list of data cleaning steps
[x] Create code turn clean data accordingly
[x] Decide on database technology
[ ] Do some tests
[ ] Deploy pipeline to AWS

Resources/Instructions

[ ] Socrata API documentation

ExperimentsInHonesty commented 4 years ago

@gregpawin Please provide an update

Progress
Blockers
Availability
ETA

gregpawin commented 4 years ago

Progress Created preprocess.py. Still needs work.
Blockers Need to figure out how to implement in AWS Glue. Also, need to finish car/aliases
Availability Couple hours/week
ETA 1-2 weeks

gregpawin commented 4 years ago

Progress Created Lambda function to download whole dataset and created Glue table but stopped before doing ETL
Blockers Maybe this isn't necessary. Need to discuss next project redesign with PM
Availability Couple hours/week
ETA 1-2 weeks

gregpawin commented 3 years ago

Cleaned data can be created via make data command using citation analysis branch

gregpawin commented 3 years ago

Reevaluating how often data needs to be kept up to date.

simzou commented 3 years ago

Was wondering about the status of this. The most recent citations I see in the database are from April 1, 2021. I think that's plenty of data to work with for now but the link to the preprocess.py script above is broken and I was wondering if we could put the existing data processing code somewhere and document its progress/usage.

tmlin1 commented 2 years ago

@gregpawin This issue has not had an update since 8/3/21. If you are no longer working on this issue please let us know. If you are able to give any closing comments related to why this issue stopped being worked on or if there are any other notes that never got added to the issue. We would appreciate it. If you are still working on the issue, please provide update using these guidelines

Progress: "What is the current status of your project? What have you completed and what is left to do?"
Blockers: "Difficulties or errors encountered."
Availability: "How much time will you have this week to work on this issue?"
ETA: "When do you expect this issue to be completed?"
Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."

gordonruby commented 2 years ago

This issue is a DRAFT for now, but anyone can update the sections based on the format below, especially the Overview section. Once we know what needs to be done and why we can prioritize whether to work on this issue.

Dependencies

ANY ISSUE NUMBERS THAT ARE BLOCKERS OR OTHER REASONS WHY THIS WOULD LIVE IN THE ICEBOX

Overview

WE NEED TO DO X FOR Y REASON

Action Items

A STEP BY STEP LIST OF ALL THE TASK ITEMS THAT YOU CAN THINK OF NOW EXAMPLES INCLUDE: Research, reporting, etc.

Resources/Instructions

REPLACE THIS TEXT -If there is a website which has documentation that helps with this issue provide the link(s) here.