The train-test split is difficult, since spatial-wise we need all data to maximize the unsupervised objective. The only viable solution would be to split the data on the temporal dimensions.
Crime prediction
We use crime in year 2010 as training data to learn the optimal partition. Notice that during the MCMC process, we do not need leave one out error. We can simply use training fitness measure to search optimal partition.
Given the optimal partition, we use 2011 crime data to test.
House price
There is a sold date field in the house price dataset. We can split by certain date, and calculate two average house price (before and now).
Significance of the optimal partition
With the optimal partition, we can use permutation test to calculate the p-value. One permutation is defined as randomly select one tract and flip its CA assignment.
How to fairly evaluate the partition?
The train-test split is difficult, since spatial-wise we need all data to maximize the unsupervised objective. The only viable solution would be to split the data on the temporal dimensions.
Crime prediction
We use crime in year 2010 as training data to learn the optimal partition. Notice that during the MCMC process, we do not need leave one out error. We can simply use training fitness measure to search optimal partition.
Given the optimal partition, we use 2011 crime data to test.
House price
There is a sold date field in the house price dataset. We can split by certain date, and calculate two average house price (before and now).
Significance of the optimal partition
With the optimal partition, we can use permutation test to calculate the
p-value
. One permutation is defined as randomly select one tract and flip its CA assignment.