insight-lane / crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
https://insightlane.org
MIT License
113 stars 40 forks source link

Design functionality for segment profiles #172

Closed bpben closed 1 year ago

bpben commented 6 years ago

As a city stakeholder, I'm interested in "why" a given segment is high risk. This is what we're getting at, somewhat, with #103 an #104. But maybe we can digest #103 a bit more: Look at the most risky segments and see if there's something common about them. Are they all high speed? Do they all intersect with an offramp from a highway? These are what I mean by "personas". Can I, as a stakeholder, beyond just receiving a ranking, be given some idea what is the common "profile" of a high risk segment?

terryf82 commented 6 years ago

This is going to be a vital feature of the 2.0 release and our project more broadly. Before we dig too much into the implementation I think it would help if we aligned on terminology since this has already been referenced under several different names:

or something else? It'll have to appear in the interface as a heading, links etc. and be easily understood by our users so what does everyone prefer?

@j-t-t @bpben @alicefeng @shreyapandit

bpben commented 6 years ago

Vote for Profiles.

I think we need to take the riskiest predicted segments and extract some patterns. Maybe some kind of PCA approach might yield something here. But we'd need a way to extract recognizable patterns on a scalable basis.

shreyapandit commented 6 years ago

Profiles sounds good! We could do an EDA with our existing data, A) statistical analysis on few things with data as is and B) Clustering data and then C) apply an interpretability layer over the results which allows us to see what makes those clusters distinct. Will start working on some of these points.

shreyapandit commented 6 years ago

Work in progress on the branch profile_analysis. I create a pickle file during the model run and use that for my analysis

shreyapandit commented 5 years ago

Correlation matrix Cambridge: image

Correlation matrix Boston image

shreyapandit commented 5 years ago

I ran it on Cambridge data as well - High risk cluster is much smaller compared to boston.

High Risk v/s Low Risk segments Boston: image

We can pick out distinct clusters of RED points (high risk segments) and Grey points (Low risk segments) together.

High Risk v/s Low Risk segments Cambridge: image

We can see that TSNE thinks that values of RED high risk segments and GREY low risk segments are closer - Hence the more homogenous intermingling of gray and red.

shreyapandit commented 5 years ago

For Boston KMeans gives intiuitive clusters:

image

shreyapandit commented 5 years ago

Next, I am going to focus on adding interpretaibility into the analysis and work with @bpben to figure out how we can incorporate this as an "explainable" step in the pipeline. Plan to look into Shapely and LIME for the same.

bpben commented 5 years ago

Currently, for v2.0, we're going to put together a POC version of these profiles for city stakeholder review. If we get a sense that this is a useful functionality, we'll develop it. Otherwise, we may go back to something like #103

bpben commented 5 years ago

Okay, update, we got here with experimenting with personas: 8292c73d61788d3b5008b093d29186632f4a4b5a. Again, on hold because we want to figure out the best way to show this in the visual that enables stakeholders to make use of it. But we've got something here, at least.

bpben commented 1 year ago

Closing this as stale, will revive as it becomes relevant.