Open gregpawin opened 1 year ago
Progress: As a first step, I uploaded the raw parking citation csv file to a gcp bucket, created a bigquery table, and created a simple looker report. I need to continue to learn how dataflow works in order to create the data cleaning pipeline.
Blockers: Need to find more resources on data cleaning pipeline.
Availability: 6 hours
ETA: I expect to find resources by next week.
Pictures (if necessary): Here is a word document about how to connect gcp and looker: https://docs.google.com/document/d/1KIcTMGKcbox6ViQLFhiLWP6pmymCKQZtvUI_UtdaNuo/edit?usp=sharing
Progress:
Link to tutorial for creating a batch data pipeline: https://cloud.google.com/dataflow/docs/guides/data-pipelines#create_a_batch_data_pipeline
Blockers: I was working on the preprocessed data, but next I will work on the raw data. I will create a script to clean the raw file.
Availability: 6 hours
ETA:
Pictures (if necessary): Looker report on preprocessed sample of lucky parking data
Batch data pipeline:
Progress: I used Cloud functions to pull data from the lucky parking website and I returned one row of data. Here is a word document with steps on how to do this: https://docs.google.com/document/d/11AgyO-B3WAbSJaOTaNHeN-VkRfxcq5f6KSILvV_7LdE/edit?usp=sharing I also added more charts to looker studio.
Blockers: I want to use cloud functions to pull data from the lucky parking website and then put the file in a cloud storage bucket. Here are some errors I'm getting:
Availability: 6 hours
ETA: I hope to figure out what is going wrong by next week.
Pictures (if necessary):
Successful cloud function returned one line of data:
Looker studio graphs:
Progress: I created a batch pipeline (in gcp) which takes in raw lucky parking data (csv file) and creates bigquery table using json and javascript files.
Blockers: I want to add cleaning to script. Not sure what cleaning steps need to be done. Also want to figure out how to use cloud functions to pull data from the lucky parking website and then put the file in a cloud storage bucket.
Availability: 6 hours
ETA: I expect to find resources by next two weeks.
Pictures (if necessary): N/A
Progress: I did some initial cleaning of the raw csv file in a jupyter notebook. I would like to then add the cleaning steps to the script.
Blockers: I need to either translate the python cleaning steps to javascript or figure out how run python scripts in gcp.
Availability: 3 hours
ETA: I expect to find resources by next week.
Pictures (if necessary): N/A
Progress: Worked on DataTalks.Club bootcamp: https://www.youtube.com/watch?v=EYNwNlOrpr0&list=PL3MmuxUbc_hJed7dXYoJw8DoCuVHhGEQb&index=5
Blockers: None
Availability: 3 hours
ETA: 2 weeks
Pictures (if necessary): N/A
Overview
We need to create a data cleaning pipeline that takes in raw input data from the Socrata API and updates the Google Cloud Platform database with the correctly formatted geospatial data.
Action items
Resources/Instructions