insight-lane / crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
https://insightlane.org
MIT License
112 stars 40 forks source link

Crash Modeling

Outline:

Project Overview

Motivation

This project was originally begun as a collaboration between Data4Democracy and the City of Boston.

On Jan 25th, 2017, 9 pedestrians were hit in Boston by vehicles. While this was a particularly dangerous day, there were 21 fatalities and over 4000 severe injuries due to crashes in 2016 alone, representing a public health issue for all those who live, work, or travel in Boston. The City of Boston would like to partner with Data For Democracy to help develop a dynamic prediction system that they can use to identify potential trouble spots to help make Boston a safer place for its citizens by targeting timely interventions to prevent crashes before they happen.

This is part of the City's long-term Vision Zero initiative, which is committed to the goal of zero fatal and serious traffic crashes in the city by 2030. The Vision Zero concept was first conceived in Sweden in 1997 and has been widely credited with a significant reduction in fatal and serious crashes on Sweden’s roads in the decades since then. Cities across the United States are adopting bold Vision Zero initiatives that share these common principles.

Children growing up today deserve...freedom and mobility. Our seniors should be able to safely get around the communities they helped build and have access to the world around them. Driving, walking, or riding a bike on Boston’s streets should not be a test of courage.

— Mayor Martin J. Walsh

What is the goal of the project?

The goal of the project is to promote the development of safer roads by identifying areas of high risk in a city's road network. It seeks to support the decision-making of transportation departments in 3 ways:

  1. Identify high risk locations - which roads in the network represent the greatest risk of crashes?

  2. Explain the contributing factors of risk - what are the features, patterns and trends that result in a location having elevated risk?

  3. Assess the impact of intervention - what is the effect of a past or planned intervention on the risk of crashes?

Who are the intended users of the project?

Though originally a collaboration between Data4Democracy and the City of Boston, the project is now being developed to work for any city that wishes to use it. The intended users include city transportation departments, those responsible for managing risk on road networks and individuals interested in crash risk.

How does the project achieve its goal?

The project uses machine learning to generate predictions of risk by combining various types of data. Right now it makes use of:

Future versions of the project are likely to make use of:

Predictions are generated on a per road-segment basis can be explored with an interactive visualization.

Who are the intended users? Though originally a collaboration between Data4Democracy and the City of Boston, the project is now being developed to work for any city that wishes to use it. The intended users include city transportation departments, those responsible for managing risk on road networks and individuals interested in crash risk.

What are the requirements for use?

Any city that wishes to can make use of the project. At a minimum, geo-coded historical crash data is required. Beyond this, cities that can supply safety concerns data (VisionZero or otherwise) will be able to generate more advanced predictions of risk.

What is the release schedule?

The intended roadmap of development for the project can be found at https://github.com/Data4Democracy/crash-model/projects.

How can I access the project?

This repo can be downloaded and run in its entirety using Docker, or you can see a current deployment of the project at https://insightlane.org.

Data Sources and Modelling

Data Sources

Data Model

Setting up

I want to set up a local development environment and run the pipeline

I want to set up a Docker development environment

I want to add a new city

I want to run the interactive visualization (showcase)

Contributing

"First-timers" are welcome! Whether you're trying to learn data science, hone your coding skills, or get started collaborating over the web, we're happy to help. If you have any questions feel free to pose them on our Slack channel, or reach out to one of the team leads.

I want to know what’s going on and pick up a task I like

Open tasks are available here Issues pertaining towards upcoming releases are available here

I want to add a new city to the online showcase

Once you’ve successfully run the pipeline on a city, get in touch with the Insight Lane team for details how to add to the showcase

Connect with us

Join our Slack channel.

Leads:

Project Organization

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py

Project structure based on the cookiecutter data science project template. #cookiecutterdatascience