insight-lane / crash-model

Build a crash prediction modeling application that leverages multiple data sources to generate a set of dynamic predictions we can use to identify potential trouble spots and direct timely safety interventions.
https://insightlane.org
MIT License
112 stars 40 forks source link

Generalizing point base features, a few other fixes #318

Closed bpben closed 10 months ago

bpben commented 2 years ago

Main thing trying to do here - enable a config to specify that there are multiple point-based features in one file (rather than repeating the information again for each feature).

There are also some documentation updates here, part of ongoing process

Also - noticed a major issue with how crashes were being joined to segments related to segment_ids being mixed types. Should now be fixed.

codecov-commenter commented 2 years ago

Codecov Report

Merging #318 (ca49f8f) into master (bf9267e) will increase coverage by 0.40%. The diff coverage is 88.04%.

@@            Coverage Diff             @@
##           master     #318      +/-   ##
==========================================
+ Coverage   64.02%   64.42%   +0.40%     
==========================================
  Files          33       33              
  Lines        3263     3314      +51     
==========================================
+ Hits         2089     2135      +46     
- Misses       1174     1179       +5     
bpben commented 2 years ago

Thanks for the comments, needed actually do some more edits, handling categorical variables a bit differently, requiring less preprocessing. Added docs.

Also - this should still work fine for old configs, it will just process them differently if they have multiple features per additional source. README has deets.

bpben commented 2 years ago

@j-t-t also, do you think the test in create_segments is necessary? I have the test for multi feature in standardize_points: https://github.com/insight-lane/crash-model/blob/052761a604cfc8f4ebaa327cf62b8d323dc874eb/src/data_standardization/tests/test_standardize_point_data.py#L154