We need other prediction task to evaluate our partition model. Ideally, different tasks require different partition boundary. And we should be able to explain the difference in the boundary from the prediction task data.
Design
The evaluation requires test-train split. We find a temporal threshold to split the data into old and new. The average price of old is used to train the partition, while the new is used to test.
Within each tract we need to keep track of total houses and sum of all unit price. Later we use sum to aggregate the CA level values, and the ratio between them is the average unit price.
Key points:
Find a good temporal split, which ideally gives us train-test ratio 1:1. So that we don't need to worry about the sparseness in average house price.
Motivation
We need other prediction task to evaluate our partition model. Ideally, different tasks require different partition boundary. And we should be able to explain the difference in the boundary from the prediction task data.
Design
Key points: