michplunkett / ucpd-incident-scraper

This code is going to be used to scrape the UCPD Daily Incident page at a pre-determined frequency and store the incidents on a generic JSON data-store.
MIT License
3 stars 2 forks source link
gcp heroku-deployment police police-data python webscraping xgboost-classifier

UChicago Incident Page Scraper

This repository houses a scraping engine for the UCPD's Incident Report webpage. The data is stored on Google Cloud Platform's Datastore and ran using Heroku's Dyno functionality.

Primary Application Functions

  1. Scrape the UCPD Incident Report webpage every weekday morning, pulling all incidents from the latest reported incident date in the Google Datastore to the current day.
  2. Upload all stored UCPD incidents to the Chicago Maroon's Google Drive every Saturday morning.

Relevant Reading

Acknowledgements

I'd like to thank @kdumais111 and @FedericoDM for their incredible help in getting the scraping architecture in place. As well as @ehabich for adding a bit of testing validation to the project. Thanks, y'all! <3

Project Requirements

Required Credentials

Technical Notes

Standard Commands