thekingofkings / chicago-partition

Automatically partition Chicago into Community Areas (CA), while minize the CA level crime prediction error.
MIT License
1 stars 1 forks source link

Fair evaluation scheme #7

Closed thekingofkings closed 6 years ago

thekingofkings commented 6 years ago

How to fairly evaluate the partition?

The train-test split is difficult, since spatial-wise we need all data to maximize the unsupervised objective. The only viable solution would be to split the data on the temporal dimensions.

Crime prediction

We use crime in year 2010 as training data to learn the optimal partition. Notice that during the MCMC process, we do not need leave one out error. We can simply use training fitness measure to search optimal partition.

Given the optimal partition, we use 2011 crime data to test.

House price

There is a sold date field in the house price dataset. We can split by certain date, and calculate two average house price (before and now).

Significance of the optimal partition

With the optimal partition, we can use permutation test to calculate the p-value. One permutation is defined as randomly select one tract and flip its CA assignment.

thekingofkings commented 6 years ago

9 average house price features also follow the temporal training-testing split