Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
Pulled all the ATR code except the propagating of volumes into the data_standardization section, as a standardize_volume script. Renamed ATR_utils to boston_volume because it's all boston-specific volume stuff
Created a volume schema, storing the ATRs there instead of geocoded_atrs
Changed the geocoding slightly so we're storing all addresses, from ATRs and TMCs in the geocoded_addresses.csv file. If the address exists, we read from that geocoded cache, otherwise we geocode, add to the cache and write out the cache to geocoded_addresses at the end
Pulled the propagate volume functionality into its own script, called propagate_volume, and hooked that up into the pipeline
Made this compatible with existing TMC functionality. Pulling parsing TMCs into the new schema in the data_standardization step will be a later task
Added some tests so code coverage at least doesn't go down. Having code coverage actually go up will also be a later task