Closed zachguo closed 10 years ago
Tiny improvements when TLMs are weighted by temporal entropy: 0.001 in each P/R/F score.
Compare baseline & logistic regression:
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 269 5 2 2 3 3 0 0 0 1 0 8
1840-1860 53 227 5 1 3 1 0 0 0 0 0 8
1861-1876 46 14 268 8 3 1 0 1 0 0 0 6
1877-1887 43 7 19 253 10 3 1 1 1 0 0 3
1888-1895 41 10 10 14 246 6 3 1 0 0 0 5
1896-1901 42 8 8 8 15 231 6 3 1 1 1 9
1902-1906 40 6 8 5 5 15 239 4 1 0 0 8
1907-1910 33 6 8 4 5 8 13 210 5 1 1 5
1911-1914 30 4 7 3 3 5 7 15 168 3 0 6
1915-1918 19 3 7 3 4 3 3 6 7 239 2 15
1919-1922 18 2 3 3 4 4 6 6 7 18 239 18
1923-present 8 2 2 1 2 1 1 2 6 3 0 72
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 293 1 1 0 0 0 0 0 0 0 0 0
1840-1860 5 290 1 0 0 0 0 0 0 0 0 1
1861-1876 5 4 338 0 1 1 0 0 0 0 0 0
1877-1887 3 3 6 329 1 0 0 0 0 0 0 0
1888-1895 4 1 1 5 328 1 0 0 0 0 0 0
1896-1901 1 1 1 2 3 326 1 0 0 0 0 0
1902-1906 1 0 1 1 0 2 326 1 0 0 0 0
1907-1910 1 1 1 1 1 2 2 287 2 0 0 0
1911-1914 2 0 1 0 0 0 1 3 244 0 0 0
1915-1918 1 0 0 0 0 0 0 1 1 309 0 0
1919-1922 1 0 0 1 1 0 1 1 0 3 320 0
1923-present 1 0 1 0 0 0 0 1 1 1 0 96
Precision means: 0.774140752915 0.956834852891
Recall means: 0.728443473512 0.956223375205
F-scores means: 0.738389811822 0.956027463121
Precision - Paired t-test: t=-273.035567744, p=2.9717845473e-144
Recall - Paired t-test: t=-317.10876046, p=1.11277534519e-150
F-score - Paired t-test: t=-311.71967931, p=6.06227064134e-150
Compare baseline & decision tree:
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 270 5 2 1 3 3 0 0 0 1 0 8
1840-1860 53 227 5 1 3 1 0 0 0 0 0 8
1861-1876 46 14 267 8 3 1 0 1 0 0 0 6
1877-1887 43 7 18 253 10 3 1 1 1 0 0 4
1888-1895 41 11 10 15 245 6 3 1 0 0 0 5
1896-1901 43 8 8 8 15 230 6 3 1 1 1 10
1902-1906 41 6 9 5 5 15 241 4 1 0 0 8
1907-1910 33 6 7 4 5 8 13 207 5 1 1 5
1911-1914 30 4 7 3 3 5 7 15 169 3 0 6
1915-1918 20 3 7 3 3 3 4 6 7 243 3 15
1919-1922 18 2 4 3 4 4 6 5 6 18 238 17
1923-present 8 2 2 1 2 1 1 2 6 3 0 72
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 272 5 3 2 2 1 1 0 1 0 0 1
1840-1860 5 281 4 2 1 1 1 1 1 0 0 1
1861-1876 4 5 320 4 2 2 1 3 1 0 0 1
1877-1887 2 2 5 316 4 3 1 2 1 0 0 1
1888-1895 2 1 3 5 317 3 1 2 1 0 0 1
1896-1901 1 1 2 3 4 315 4 2 1 1 0 1
1902-1906 1 1 1 2 1 4 317 3 1 1 0 1
1907-1910 1 2 3 3 2 4 4 271 3 1 1 1
1911-1914 1 1 2 1 2 1 2 4 234 1 0 2
1915-1918 0 1 1 1 1 1 1 2 2 300 3 2
1919-1922 0 0 0 0 0 1 0 1 0 3 322 1
1923-present 1 1 1 0 1 1 1 1 2 2 1 86
Precision means: 0.774543826647 0.920213579013
Recall means: 0.728727471327 0.919696886947
F-scores means: 0.738733104219 0.91974607323
Precision - Paired t-test: t=-170.309841188, p=5.26876590671e-124
Recall - Paired t-test: t=-225.373543385, p=5.10660564736e-136
F-score - Paired t-test: t=-213.936008364, p=8.7674367887e-134
Compare baseline & support vector machine:
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 268 5 2 1 3 3 0 0 0 0 0 8
1840-1860 53 227 5 1 3 1 0 0 0 0 0 8
1861-1876 46 14 268 8 3 2 0 1 0 0 0 6
1877-1887 43 7 19 253 10 3 1 1 0 0 0 3
1888-1895 41 10 10 14 246 6 3 1 0 0 0 5
1896-1901 43 8 8 8 15 234 6 3 1 1 1 9
1902-1906 41 6 9 5 5 16 239 5 1 0 0 8
1907-1910 33 6 7 4 5 8 13 207 4 1 1 6
1911-1914 31 4 7 2 3 5 7 14 169 3 0 6
1915-1918 20 4 6 3 4 3 4 6 7 240 2 15
1919-1922 18 3 4 3 4 4 6 6 7 17 239 17
1923-present 8 2 2 1 2 2 1 2 7 3 0 71
pre-1839 1840-1860 1861-1876 1877-1887 1888-1895 1896-1901 1902-1906 1907-1910 1911-1914 1915-1918 1919-1922 1923-present
pre-1839 283 2 1 1 0 0 0 0 0 1 0 1
1840-1860 11 281 3 1 0 0 0 0 0 0 0 1
1861-1876 11 5 329 2 1 0 0 1 0 1 0 1
1877-1887 7 4 7 319 2 0 1 1 0 1 0 0
1888-1895 7 2 2 7 318 2 0 0 0 1 0 0
1896-1901 5 1 2 3 5 314 3 1 0 1 0 1
1902-1906 5 0 2 3 1 2 315 1 0 1 0 0
1907-1910 3 1 2 2 2 3 3 278 2 2 0 0
1911-1914 4 1 3 2 1 1 1 3 236 1 0 0
1915-1918 2 0 1 1 1 0 1 2 1 303 0 1
1919-1922 2 0 1 2 2 1 2 2 1 5 309 1
1923-present 2 0 1 1 0 0 0 1 0 1 0 94
Precision means: 0.774264759357 0.929139368326
Recall means: 0.728001092299 0.926376297105
F-scores means: 0.738223253385 0.926565580778
Precision - Paired t-test: t=-233.454816952, p=1.57129851472e-137
Recall - Paired t-test: t=-283.782412722, p=6.53515778934e-146
F-score - Paired t-test: t=-275.749394696, p=1.11771561831e-144
Rerun our classifier with addition of a new feature - temporal entropy; Evaluate performance and generate confusion matrix;