sfbrigade / datasci-firerisk

This project attempts to model and acquire data from SF OpenData - and other sources - to predict the relative risk of fire in San Francisco’s buildings and public spaces.
http://codeforsanfrancisco.org/projects/SF-Fire-Risk-Project
10 stars 9 forks source link

Prepare feature data from 'matched_Fire_Safety_Complaints.csv' #9

Open stahlerk opened 6 years ago

stahlerk commented 6 years ago

1) Subset data to potentially useful features 2) Detect and remove outliers 3) Consider dropping complaints with 'No merit' in the 'Disposition' column 4) Consider organizing "Complaint Item Type Description" column into more generalized groups (if appropriate) 5) Collapse data at EAS level 6) Create any potentially relevant features (for example, total number of complaints b/w 2005-2016 associated with EAS, etc.) 7) Any other data cleaning and standardization operations 8) Output as .csv (indexed at EAS)