insight-lane / crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
https://insightlane.org
MIT License
113 stars 40 forks source link

Create Data Dictionary #157

Closed terryf82 closed 5 years ago

terryf82 commented 6 years ago

@alicefeng could you add a short description explaining the scope of this task as I'm not sure I fully understand the application of it.

@shreyapandit is this something you have started on already? If so feel free to add any relevant details, thanks.

alicefeng commented 6 years ago

Thanks for creating the issue @terryf82 !

So what I'm looking for here is an explanation of the features in the model and the possible values they can have (which is especially important for categorical variables since those encodings may not be clear). Specifically what I'm looking for is a table with the following:

(The reason I'm asking for this is 1- I'd like to have an understanding of what our model is taking into consideration when generating its predictions and 2- I need to know how to translate these feature names into something more human-friendly for the interpretability part of the viz)

shreyapandit commented 6 years ago

Link to google doc: https://docs.google.com/document/d/1PwA07OfSD5ELy0pPTb4ieDDKO2syrBojA_J_JRLkhk4/edit

Currently running model locally to make sure I cover everything

terryf82 commented 6 years ago

I am wondering if our schemas are not the appropriate place to store this type of information? Most of our data inputs / outputs have a schema already (crashes, concerns, predictions) or one is in the works / planned (point based features, segments etc.)

The predictions schema @ https://github.com/Data4Democracy/crash-model/blob/data_standards/standards/predictions-schema.json is still only in draft form and will need updating based on the work we're doing at the moment, but as an example it provides most of the features @alicefeng has mentioned as desirable:

as well as a structured way to specify data type and enumerate allowed values where applicable.

Seems to me like we already have the right tool for the job, what do others think?

@j-t-t @bpben

shreyapandit commented 6 years ago

I think the document can serve as a temporary point of reference until we make our schemas updated :)

terryf82 commented 5 years ago

@shreyapandit to start migrating the Google Docs content into a markdown file in the repo.

Let me know if you need a hand.

shreyapandit commented 5 years ago

First pass of markdown is here: https://github.com/Data4Democracy/crash-model/pull/227

Refining some of the sources for features since our model style changed recently.