thekingofkings / chicago-partition

Automatically partition Chicago into Community Areas (CA), while minize the CA level crime prediction error.
MIT License
1 stars 1 forks source link

How to merge various features for prediction model #3

Closed thekingofkings closed 6 years ago

thekingofkings commented 6 years ago

How to fuse various features for prediction model?

Old method - numpy array concatenate

In my chicago-crime project, I manually maintain a numpy array for each community area (CA) to store their features under one view. Therefore, the numpy array concatenate function is widely used to combine features from different views.

The main assumption for CA is that 1) there is no missing features for any CA, and 2) the CA is indexed with continuous ID. Both assumptions do not hold at tract level.

New method - pandas DataFrame join

The tract ID, which is not continuous, will be used to index each row of the feature vector. The pandas.DataFrame.join function provides easy and robust performance feature fusions.