sungchun12 / schedule-python-script-using-Google-Cloud

:clock4: Schedules a Python script to append data into Bigquery using Google Cloud's App Engine with a cron job
12 stars 3 forks source link
appengine-python bigquery chicago-traffic cron google-cloud python-script

schedule-python-script-using-Google-Cloud

Use Case: Automates live Chicago traffic data and flows it into BigQuery for interactive real-time analysis

Technical Concept: Schedules a simple Python script to append data into BigQuery using Google Cloud's App Engine with a cron job.

Source Data: https://data.cityofchicago.org/Transportation/Chicago-Traffic-Tracker-Congestion-Estimates-by-Se/n4j6-wkkf

Architecture Reference: http://zablo.net/blog/post/python-apache-beam-google-dataflow-cron

Shout out to Mylin Ackerman for all his help. Saved me weeks of research with his personal touch. https://www.linkedin.com/in/mylin-ackermann-25a00445/

Check me out on LinkedIn: https://www.linkedin.com/in/sungwonchung1/

Setup Prerequisites:

  1. Signup for Google Cloud account and enable billing
  2. Enable BigQuery API, Stackdriver API, Google Cloud Deployment Manager V2 API, Google Compute Engine API

Order of Operations:

  1. Develop scripts with Google cloud shell or SDK
  2. Deploy on appengine
  3. Deploy cron job
  4. Check BigQuery
  5. Connect with dataviz tool such as Tableau

Development Instructions:

  1. Copy github repository into SDK or Google cloud shell(thankfully it has persistent storage, so you don't have to recopy the folder structure): git clone https://github.com/sungchun12/schedule-python-script-using-Google-Cloud.git
  2. Create BigQuery dataset: "chicago_traffic"

Deploy Instructions:

  1. Remember to put init.py files into all local packages
  2. Change directory: cd ~/chicago-traffic
  3. Install all required packages into local lib folder: pip install -r requirements.txt -t lib
  4. To deploy App Engine app, run: gcloud app deploy app.yaml
  5. To deploy App Engine CRON, run: gcloud app deploy cron.yaml

Folder Structure:

alt text

init.py needed to properly deploy within App Engine

append_data.py - call the Chicago live traffic API and appends it into BigQuery

app.yaml - definition of Google App Engine application

appengine_config.py adds dependencies to locally installed packages (from lib folder)

cron.yaml - definition of Google App Engine CRON job

main.py - entry point for the web application and calls the function contained within "append_data.py"

requirements.txt - file for pip package manager, which contains list of all required packages to run the application and the pipeline

lib - local folder with all pip-installed packages from requirements.txt file