Use Case: Automates live Chicago traffic data and flows it into BigQuery for interactive real-time analysis
Technical Concept: Schedules a simple Python script to append data into BigQuery using Google Cloud's App Engine with a cron job.
Source Data: https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Congestion-Estimates-by-Se/n4j6-wkkf
Architecture Reference: http://zablo.net/blog/post/python-apache-beam-google-dataflow-cron
Shout out to Mylin Ackerman for all his help. Saved me weeks of research with his personal touch. https://www.linkedin.com/in/mylin-ackermann-25a00445/
Check me out on LinkedIn: https://www.linkedin.com/in/sungwonchung1/
Setup Prerequisites:
Order of Operations:
Development Instructions:
Deploy Instructions:
Folder Structure:
init.py needed to properly deploy within App Engine
append_data.py - call the Chicago live traffic API and appends it into BigQuery
app.yaml - definition of Google App Engine application
appengine_config.py adds dependencies to locally installed packages (from lib folder)
cron.yaml - definition of Google App Engine CRON job
main.py - entry point for the web application and calls the function contained within "append_data.py"
requirements.txt - file for pip package manager, which contains list of all required packages to run the application and the pipeline
lib - local folder with all pip-installed packages from requirements.txt file