hackforla / lucky-parking

Visualization of parking data to assist in understanding of the effects of parking policies on a neighborhood by neighborhood basis in the City of Los Angeles
https://www.hackforla.org/projects/lucky-parking.html
33 stars 61 forks source link

Create data cleaning pipeline in GCP #469

Open gregpawin opened 1 year ago

gregpawin commented 1 year ago

Overview

We need to create a data cleaning pipeline that takes in raw input data from the Socrata API and updates the Google Cloud Platform database with the correctly formatted geospatial data.

Action items

Resources/Instructions

limwualice commented 1 year ago

Progress: As a first step, I uploaded the raw parking citation csv file to a gcp bucket, created a bigquery table, and created a simple looker report. I need to continue to learn how dataflow works in order to create the data cleaning pipeline.

Blockers: Need to find more resources on data cleaning pipeline.

Availability: 6 hours

ETA: I expect to find resources by next week.

Pictures (if necessary): Here is a word document about how to connect gcp and looker: https://docs.google.com/document/d/1KIcTMGKcbox6ViQLFhiLWP6pmymCKQZtvUI_UtdaNuo/edit?usp=sharing

limwualice commented 1 year ago

Progress:

Link to tutorial for creating a batch data pipeline: https://cloud.google.com/dataflow/docs/guides/data-pipelines#create_a_batch_data_pipeline

Blockers: I was working on the preprocessed data, but next I will work on the raw data. I will create a script to clean the raw file.

Availability: 6 hours

ETA:

Pictures (if necessary): Looker report on preprocessed sample of lucky parking data Screen Shot 2023-03-23 at 4 47 06 PM Screen Shot 2023-03-23 at 4 47 18 PM Screen Shot 2023-03-23 at 4 47 35 PM

Screen Shot 2023-03-23 at 4 47 57 PM

Batch data pipeline:

Screen Shot 2023-03-24 at 11 44 56 AM

limwualice commented 1 year ago

Progress: I used Cloud functions to pull data from the lucky parking website and I returned one row of data. Here is a word document with steps on how to do this: https://docs.google.com/document/d/11AgyO-B3WAbSJaOTaNHeN-VkRfxcq5f6KSILvV_7LdE/edit?usp=sharing I also added more charts to looker studio.

Blockers: I want to use cloud functions to pull data from the lucky parking website and then put the file in a cloud storage bucket. Here are some errors I'm getting:

Availability: 6 hours

ETA: I hope to figure out what is going wrong by next week.

Pictures (if necessary):

Successful cloud function returned one line of data: Screen Shot 2023-03-28 at 2 31 33 PM

Looker studio graphs: Screen Shot 2023-03-28 at 2 32 58 PM Screen Shot 2023-03-28 at 2 33 04 PM Screen Shot 2023-03-28 at 2 33 11 PM

limwualice commented 1 year ago

Progress: I created a batch pipeline (in gcp) which takes in raw lucky parking data (csv file) and creates bigquery table using json and javascript files.

Blockers: I want to add cleaning to script. Not sure what cleaning steps need to be done. Also want to figure out how to use cloud functions to pull data from the lucky parking website and then put the file in a cloud storage bucket.

Availability: 6 hours

ETA: I expect to find resources by next two weeks.

Pictures (if necessary): N/A

limwualice commented 1 year ago

Progress: I did some initial cleaning of the raw csv file in a jupyter notebook. I would like to then add the cleaning steps to the script.

Blockers: I need to either translate the python cleaning steps to javascript or figure out how run python scripts in gcp.

Availability: 3 hours

ETA: I expect to find resources by next week.

Pictures (if necessary): N/A

limwualice commented 1 year ago

Progress: Worked on DataTalks.Club bootcamp: https://www.youtube.com/watch?v=EYNwNlOrpr0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=5

Blockers: None

Availability: 3 hours

ETA: 2 weeks

Pictures (if necessary): N/A