zachguo / TCoHOT

Temporal Classification of HathiTrust OCRed Texts (codes for paper published in iConf 2015)
http://hdl.handle.net/2142/73656
3 stars 5 forks source link

Rerun model using temporal entropy. #36

Closed zachguo closed 10 years ago

zachguo commented 10 years ago

Rerun our classifier with addition of a new feature - temporal entropy; Evaluate performance and generate confusion matrix;

zachguo commented 10 years ago

Tiny improvements when TLMs are weighted by temporal entropy: 0.001 in each P/R/F score.

Compare baseline & logistic regression:

                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          269            5            2            2            3            3            0            0            0            1            0            8
   1840-1860           53          227            5            1            3            1            0            0            0            0            0            8
   1861-1876           46           14          268            8            3            1            0            1            0            0            0            6
   1877-1887           43            7           19          253           10            3            1            1            1            0            0            3
   1888-1895           41           10           10           14          246            6            3            1            0            0            0            5
   1896-1901           42            8            8            8           15          231            6            3            1            1            1            9
   1902-1906           40            6            8            5            5           15          239            4            1            0            0            8
   1907-1910           33            6            8            4            5            8           13          210            5            1            1            5
   1911-1914           30            4            7            3            3            5            7           15          168            3            0            6
   1915-1918           19            3            7            3            4            3            3            6            7          239            2           15
   1919-1922           18            2            3            3            4            4            6            6            7           18          239           18
1923-present            8            2            2            1            2            1            1            2            6            3            0           72
                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          293            1            1            0            0            0            0            0            0            0            0            0
   1840-1860            5          290            1            0            0            0            0            0            0            0            0            1
   1861-1876            5            4          338            0            1            1            0            0            0            0            0            0
   1877-1887            3            3            6          329            1            0            0            0            0            0            0            0
   1888-1895            4            1            1            5          328            1            0            0            0            0            0            0
   1896-1901            1            1            1            2            3          326            1            0            0            0            0            0
   1902-1906            1            0            1            1            0            2          326            1            0            0            0            0
   1907-1910            1            1            1            1            1            2            2          287            2            0            0            0
   1911-1914            2            0            1            0            0            0            1            3          244            0            0            0
   1915-1918            1            0            0            0            0            0            0            1            1          309            0            0
   1919-1922            1            0            0            1            1            0            1            1            0            3          320            0
1923-present            1            0            1            0            0            0            0            1            1            1            0           96

Precision means:  0.774140752915 0.956834852891
Recall    means:  0.728443473512 0.956223375205
F-scores  means:  0.738389811822 0.956027463121

Precision - Paired t-test: t=-273.035567744, p=2.9717845473e-144
Recall    - Paired t-test: t=-317.10876046, p=1.11277534519e-150
F-score   - Paired t-test: t=-311.71967931, p=6.06227064134e-150

Compare baseline & decision tree:
                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          270            5            2            1            3            3            0            0            0            1            0            8
   1840-1860           53          227            5            1            3            1            0            0            0            0            0            8
   1861-1876           46           14          267            8            3            1            0            1            0            0            0            6
   1877-1887           43            7           18          253           10            3            1            1            1            0            0            4
   1888-1895           41           11           10           15          245            6            3            1            0            0            0            5
   1896-1901           43            8            8            8           15          230            6            3            1            1            1           10
   1902-1906           41            6            9            5            5           15          241            4            1            0            0            8
   1907-1910           33            6            7            4            5            8           13          207            5            1            1            5
   1911-1914           30            4            7            3            3            5            7           15          169            3            0            6
   1915-1918           20            3            7            3            3            3            4            6            7          243            3           15
   1919-1922           18            2            4            3            4            4            6            5            6           18          238           17
1923-present            8            2            2            1            2            1            1            2            6            3            0           72
                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          272            5            3            2            2            1            1            0            1            0            0            1
   1840-1860            5          281            4            2            1            1            1            1            1            0            0            1
   1861-1876            4            5          320            4            2            2            1            3            1            0            0            1
   1877-1887            2            2            5          316            4            3            1            2            1            0            0            1
   1888-1895            2            1            3            5          317            3            1            2            1            0            0            1
   1896-1901            1            1            2            3            4          315            4            2            1            1            0            1
   1902-1906            1            1            1            2            1            4          317            3            1            1            0            1
   1907-1910            1            2            3            3            2            4            4          271            3            1            1            1
   1911-1914            1            1            2            1            2            1            2            4          234            1            0            2
   1915-1918            0            1            1            1            1            1            1            2            2          300            3            2
   1919-1922            0            0            0            0            0            1            0            1            0            3          322            1
1923-present            1            1            1            0            1            1            1            1            2            2            1           86

Precision means:  0.774543826647 0.920213579013
Recall    means:  0.728727471327 0.919696886947
F-scores  means:  0.738733104219 0.91974607323

Precision - Paired t-test: t=-170.309841188, p=5.26876590671e-124
Recall    - Paired t-test: t=-225.373543385, p=5.10660564736e-136
F-score   - Paired t-test: t=-213.936008364, p=8.7674367887e-134

Compare baseline & support vector machine:
                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          268            5            2            1            3            3            0            0            0            0            0            8
   1840-1860           53          227            5            1            3            1            0            0            0            0            0            8
   1861-1876           46           14          268            8            3            2            0            1            0            0            0            6
   1877-1887           43            7           19          253           10            3            1            1            0            0            0            3
   1888-1895           41           10           10           14          246            6            3            1            0            0            0            5
   1896-1901           43            8            8            8           15          234            6            3            1            1            1            9
   1902-1906           41            6            9            5            5           16          239            5            1            0            0            8
   1907-1910           33            6            7            4            5            8           13          207            4            1            1            6
   1911-1914           31            4            7            2            3            5            7           14          169            3            0            6
   1915-1918           20            4            6            3            4            3            4            6            7          240            2           15
   1919-1922           18            3            4            3            4            4            6            6            7           17          239           17
1923-present            8            2            2            1            2            2            1            2            7            3            0           71
                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          283            2            1            1            0            0            0            0            0            1            0            1
   1840-1860           11          281            3            1            0            0            0            0            0            0            0            1
   1861-1876           11            5          329            2            1            0            0            1            0            1            0            1
   1877-1887            7            4            7          319            2            0            1            1            0            1            0            0
   1888-1895            7            2            2            7          318            2            0            0            0            1            0            0
   1896-1901            5            1            2            3            5          314            3            1            0            1            0            1
   1902-1906            5            0            2            3            1            2          315            1            0            1            0            0
   1907-1910            3            1            2            2            2            3            3          278            2            2            0            0
   1911-1914            4            1            3            2            1            1            1            3          236            1            0            0
   1915-1918            2            0            1            1            1            0            1            2            1          303            0            1
   1919-1922            2            0            1            2            2            1            2            2            1            5          309            1
1923-present            2            0            1            1            0            0            0            1            0            1            0           94

Precision means:  0.774264759357 0.929139368326
Recall    means:  0.728001092299 0.926376297105
F-scores  means:  0.738223253385 0.926565580778

Precision - Paired t-test: t=-233.454816952, p=1.57129851472e-137
Recall    - Paired t-test: t=-283.782412722, p=6.53515778934e-146
F-score   - Paired t-test: t=-275.749394696, p=1.11771561831e-144