insight-lane / crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
https://insightlane.org
MIT License
113 stars 40 forks source link

Modify standardization & use of crash data to permit crashes by month rather than week #162

Closed terryf82 closed 5 years ago

terryf82 commented 5 years ago

Certain cities (Philadelphia and Brisbane at least) provide crash data with no day of month property, for privacy reasons.

For these cities to work we need to look at upgrading at least two processes:

  1. the standardize_crashes script needs to understand when the raw crash data has no day of month property and record the available date accordingly
  2. the modeling process needs to know the city uses per-month crash data (by reading the config?) and generate predictions of the same nature.
terryf82 commented 5 years ago

PR 181 implements something of a work-around for this issue, so that we can continue onboarding cities that choose not to supply the date of month for crashes (Pittsburgh, Brisbane and Philladelphia so far).

The _initializecity script now provides additional options for specifying different date formats, with _datecomplete, or in combination _dateyear, _datemonth and _dateday. This last one is optional, because some cities like those mentioned above withhold the day of month to anonymize the crash data. If this field is left blank, _standardizecrashes will pick a random day of the month and assign it.

I don't know there's any point in moving further on this, until we decide the value of temporal predictions (by week, month or any other period) versus simply predictions at point of execution.