Closed thekingofkings closed 7 years ago
Setting:
GLM
model. GLM.NegativeBinomial
model. Get the following error:
SVD did not converge.
GLM.Poisson
model, the error is huge
MAE | MRE |
---|---|
26421 | 5.519 |
Update setting by adding the constant feature term -- 1
.
Method | training MAE | traing MRE | testing MAE | testing MRE |
---|---|---|---|---|
Negative Binomial | 3684 | 1.5815 | 3726.11 | 0.7784 |
Poisson | 3997 | 1.5811 | 3995.54 | 0.8346 |
Gaussian | 3764 | 1.6313 | 3520.31 | 0.7354 |
GLM.Gaussian
gives lowest testing error, but not training error. How come?
Method | training MAE | training MRE | testing MAE | testing MRE |
---|---|---|---|---|
Negative Binomial | 741.47 | 0.7186 | 578.88 | 0.4320 |
Poisson | 761.99 | 0.7369 | 574.36 | 0.4286 |
Gaussian | 768.25 | 0.7514 | 628.31 | 0.4689 |
GLM.Poisson
gives the lowest testing error, but not training error. Why?
Also, notice that NB is overall close to the best model. GLM.Gaussian
has a performance gap on both training and testing.
Method | testing MAE (40) | testing MRE (40) | testing MAE (20) | testing MRE (20) |
---|---|---|---|---|
Negative Binomial | 1010.40 | 0.7540 | 578.88 | 0.4320 |
Poisson | 998.98 | 0.7455 | 574.36 | 0.4286 |
Gaussian | 869.53 | 0.6489 | 628.31 | 0.4689 |
Conclusion: In our problem, we should prefer short graph embedding vector size. Fact: size 40 is worse than 20.
Best setting to learn graph embedding:
We use the graph embedding to calculate vector representations of each CA. Then use this CA vector representation for crime prediction.
The graph embedding is generated with LINE paper and code.