zachguo / TCoHOT

Temporal Classification of HathiTrust OCRed Texts (codes for paper published in iConf 2015)
http://hdl.handle.net/2142/73656
3 stars 5 forks source link

Generate confusion matrixes of midterm models #31

Closed zachguo closed 10 years ago

zachguo commented 10 years ago

So we can know which time slices are confusing with each other.

zachguo commented 10 years ago

Here are the confusion matrixes for midterm models, naive (the model that use first met date-in-text as prediction) and logistic regression.

(Test dataset size is 3653)

Confusion matrix for naive model:

                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          267            5            2            1            2            3            0            0            0            1            0            8
   1840-1860           54          228            5            1            3            3            0            0            0            0            0            7
   1861-1876           45           14          268            8            2            2            1            1            1            0            0            6
   1877-1887           43            7           18          253           10            4            1            1            1            0            0            4
   1888-1895           41            9           10           15          243            7            3            1            0            0            0            5
   1896-1901           41            8            7            7           15          228            7            4            1            2            1            9
   1902-1906           41            6            8            5            5           15          236            4            1            1            0            8
   1907-1910           33            7            8            4            5            8           13          208            5            1            2            6
   1911-1914           30            4            7            2            3            4            7           14          165            2            0            6
   1915-1918           20            3            7            3            3            4            4            6            7          241            2           16
   1919-1922           19            2            4            3            4            4            6            5            7           18          240           18
1923-present           10            6            3            1            3            5            2            3            8            3            0           70

Confusion matrix for logistic regression model:

                 pre-1839    1840-1860    1861-1876    1877-1887    1888-1895    1896-1901    1902-1906    1907-1910    1911-1914    1915-1918    1919-1922 1923-present
    pre-1839          268            6            3            1            2            3            0            1            1            1            0            5
   1840-1860           38          247            6            2            3            2            0            1            0            0            0            3
   1861-1876           28           14          286            9            3            2            1            1            1            0            0            4
   1877-1887           24            6           23          266           11            4            2            2            1            0            0            3
   1888-1895           26            7           10           15          259            8            4            1            0            0            0            4
   1896-1901           26            6            8            8           17          240            8            4            2            2            2            8
   1902-1906           23            6            8            3            6           17          250            6            1            1            1            7
   1907-1910           16            4            7            3            4            8           15          225            6            2            3            5
   1911-1914           19            3            5            2            3            4            5           19          175            3            1            6
   1915-1918           11            2            5            2            1            3            3            6            9          256            7           11
   1919-1922           10            1            3            2            2            3            4            6            6           16          271            5
1923-present            8            4            3            2            3            5            2            4           11            5            1           65

At first glance, both models didn't work very well in distinguishing pre-1839 from other time slices. What do you guys think?

zhhuo commented 10 years ago

?Hi Trevor: do you have some time tomorrow night in the lab. I wanna discuss about the maping reduce function. Sorry I am sick today and lost my voice, would better email you asking about it.

Thank you.


From: Zach Guo notifications@github.com Sent: Monday, March 31, 2014 3:58 PM To: zachguo/Z604-Project Subject: Re: [Z604-Project] Generate confusion matrixes of midterm models (#31)

Closed #31https://github.com/zachguo/Z604-Project/issues/31 via 4273e8fhttps://github.com/zachguo/Z604-Project/commit/4273e8f724089d2029ff0f6a3810302f5c4c81af.

Reply to this email directly or view it on GitHubhttps://github.com/zachguo/Z604-Project/issues/31.

tedelblu commented 10 years ago

What time would you like to meet?

On Wed, Apr 2, 2014 at 12:04 PM, zhhuo notifications@github.com wrote:

?Hi Trevor: do you have some time tomorrow night in the lab. I wanna discuss about the maping reduce function. Sorry I am sick today and lost my voice, would better email you asking about it.

Thank you.


From: Zach Guo notifications@github.com Sent: Monday, March 31, 2014 3:58 PM To: zachguo/Z604-Project Subject: Re: [Z604-Project] Generate confusion matrixes of midterm models (#31)

Closed #31https://github.com/zachguo/Z604-Project/issues/31 via 4273e8f< https://github.com/zachguo/Z604-Project/commit/4273e8f724089d2029ff0f6a3810302f5c4c81af>.

Reply to this email directly or view it on GitHub< https://github.com/zachguo/Z604-Project/issues/31>.

Reply to this email directly or view it on GitHubhttps://github.com/zachguo/Z604-Project/issues/31#issuecomment-39348978 .