thekingofkings / chicago-partition

Automatically partition Chicago into Community Areas (CA), while minize the CA level crime prediction error.
MIT License
1 stars 1 forks source link

Naive MCMC gives the best house prediction accuracy (compares to softmax and q-learning MCMC) #13

Closed thekingofkings closed 6 years ago

thekingofkings commented 6 years ago

Issue: Naive MCMC outperforms other methods for house prediction.

Intuitively, the Naive MCMC should not outperform other methods. What causes the counter intuitive observations?

Guess 1: population variance term.

The penalty term dominates the optimization objective. Temporary solution: predict crime alone and ignores any penalty.

The following are the results of only minimizing the house price prediction error, namely set the weight for penalty term as 0. It is clear that the naive method still gives better prediction error.

hxw186@lab-ubuntu:~/workspace/chicago-partition/output$ cat house-price-naive-sampler-v*-output.txt error: 16.7086 iterations: 9421 acceptance rate: 0.2180 error: 28.7022 iterations: 1191 acceptance rate: 0.4282 error: 27.5385 iterations: 15554 acceptance rate: 0.2752 error: 12.6841 iterations: 14664 acceptance rate: 0.2672 error: 14.9150 iterations: 12570 acceptance rate: 0.2733 error: 30.1358 iterations: 2833 acceptance rate: 0.3516 error: 27.9116 iterations: 5063 acceptance rate: 0.3399 error: 16.5952 iterations: 17909 acceptance rate: 0.2566 error: 33.3977 iterations: 9859 acceptance rate: 0.3066 error: 16.9956 iterations: 3308 acceptance rate: 0.2845

hxw186@lab-ubuntu:~/workspace/chicago-partition/output$ cat house-price-softmax-sampler-v*-output.txt error: 37.6541 iterations: 1846 acceptance rate: 0.1165 error: 25.6696 iterations: 3308 acceptance rate: 0.0907 error: 23.7974 iterations: 3994 acceptance rate: 0.1002 error: 39.8807 iterations: 8358 acceptance rate: 0.1085 error: 19.0459 iterations: 4172 acceptance rate: 0.0908 error: 34.0887 iterations: 4418 acceptance rate: 0.0731 error: 24.4786 iterations: 18978 acceptance rate: 0.0769 error: 23.0170 iterations: 4826 acceptance rate: 0.0646 error: 21.8619 iterations: 4528 acceptance rate: 0.0987 error: 22.6730 iterations: 3111 acceptance rate: 0.1045

thekingofkings commented 6 years ago

Guess 2: the naive method stops early

There is another stopping criteria, which is the following: https://github.com/thekingofkings/chicago-partition/blob/64a2e5389082a6018362b8e4a3ebcd654b80e5a5/MCMC.py#L205

Namely, for Naive MCMC and Softmax MCMC, if we cannot find a better solution after M=100 times rejections, the search stops.

Evidence: I notice that for Naive MCMC, the following codes are not executed (no print screen info) for some rounds.

              if isConvergent(epsilon,f_series):
                # when mae converges
                print "converge in {} samples with {} acceptances \
                    sample conversion rate {}".format(iter_cnt, len(mae_series),
                                                      len(mae_series) / float(iter_cnt))
                CommunityArea.visualizeCAs(iter_cnt=None,fname=project_name+"-CAs-iter-final.png")
                CommunityArea.visualizePopDist(iter_cnt=None,fname=project_name+'-pop-distribution-final')
                break
thekingofkings commented 6 years ago

Statistics on 100 rounds ----TASK: House Price Prediction----

Rand Index: house-price-naive-sampler - Mean Adjusted Rand Index 0.4982 (0.06) house-price-softmax-sampler - Mean Adjusted Rand Index 0.6702 (0.04) house-price-q-learning-sampler - Mean Adjusted Rand Index 0.6171 (0.03)


Simulation Summaries: house-price-naive-sampler - Mean iterations 439.80 (190.53) house-price-naive-sampler - Mean acceptance rate 0.54 (0.05) house-price-naive-sampler - Mean error 25.73 (2.76) house-price-softmax-sampler - Mean iterations 634.93 (393.40) house-price-softmax-sampler - Mean acceptance rate 0.27 (0.10) house-price-softmax-sampler - Mean error 27.13 (2.98) house-price-q-learning-sampler - Mean iterations 314.44 (63.89) house-price-q-learning-sampler - Mean acceptance rate 1.00 (0.00) house-price-q-learning-sampler - Mean error 25.16 (1.30)