kobinabrandon / Hourly-Divvy-Trip-Predictor

An end-to-end batch machine learning system that produces hourly predictions of the number of arrivals and departures that will take place at various stations in Chicago's Divvy bike sharing system.
GNU General Public License v3.0
0 stars 0 forks source link
machinelearning-python time-series

Hourly Divvy Trip Predictor Service

Introduction

The city of Chicago is home to nearly 3 million people, and it is currently the third most populous city in the US. Furthermore, its Cook County is the second most populous county in the country. Owing to this massive population, there are a range of transport options in the city. One of these is the city's Divvy Bike-sharing system, complete with hundreds of stations and thousands of bikes & scooters. It is currently operated by the ride-sharing company Lyft, and has been in existence for 9 years. With this many trips taking place every day for this long, this makes Divvy's historical trip data an attractive source of time-series data (at least for me :D), especially because the data is updated monthly.

The Business Problem

How can we predict the number of trips that will start and end at various stations in the city each hour?

  1. Being able to anticipate spikes in activity will enable Divvy to allocate bikes and scooters more efficiently over time.
  2. This capabability could help the management to plan any possible changes in the scale of their services in a given area.
  3. Having models that predict customer activity in this way can provide a sense of confidence in managements understanding customer behaviour.

The Objective

Build a complete end-to-end machine learning system that culminates in a simple frontend which provides the desired predictions in an interactive manner.

System Design

Feature Pipeline

Training Pipline

Inference Pipeline

Use the App

A containerised version of the app is available here.

Alternatively, you can build the project locally by doing the following:

  1. Clone the repository:

    $ git clone https://github.com/maadabrandon/Hourly-Divvy-Trip-Predictor
  2. Install Poetry

    $ curl -sSL https://install.python-poetry.org | python3 -
  3. Enter the project directory and run:

    $ poetry install
  4. Register free accounts on Hopsworks and CometML. Then copy your project names(for both platforms), API keys(again for both platforms), Comet workspace name, and email address into a .env file.

  5. Backfill the Hopsworks feature groups with historical data:

    $ make backfill-features
  6. Run the training pipeline:

    $ make train-all
  7. Backfill the Hopsworks feature groups with predictions:

    $ make backfill-predictions
  8. View the frontend:

    $ make frontend