jaekeol / fasttextRA

1 stars 0 forks source link

실험결과모음 #4

Open jaekeol opened 5 years ago

jaekeol commented 5 years ago

rw.txt 에서 preasonra 좋아지는 것 확인

(1) idf 고려한 version ( probability sigmoid 사용 ) rw.txt : 43 (OOV: 0%) 365.912837537

(2)idf 고려하지 않은 version rw.txt : 42 (OOV: 0%) 365.130911178

jaekeol commented 5 years ago

allurement 질의에서 root 에 가중치가 높아지는것 확인

naive model 에서는 all, ment 등 prefix, suffix 에서 similarity가 높았으나, 이제는 root 에서 similarity가 높아지는것을 확인할 수 있다.   allurement 1
1 lur 0.9458
2 llu 0.9173
3 urem 1
4 allu 0.9825
5 ureme 1
6 uremen 1
7 eme 0.6278
8 rem 0.6903
9 reme 0.8614
10 ment 0.5194
11 men 0.4781
12 <allur 1
13 rement 0.9524
14 ure 0.3512
15 lure 0.9683
16 ment> 0.5932
17 allur 1
18 <allu 1
19 <all 0.4322
20 remen 0.9449
21 emen 0.7003
22 ent> 0.2604
23 all 0.2138
24 nt> 0.2067
25 ent 0.1614
26 <al 0.1812
27 ement> 0.7702
28 llur 1
29 ement 0.7099
30 llurem 1
31 llure 1
32 lureme 1
33 lurem 1
34 allure 1
jaekeol commented 5 years ago

ws353으로 평가한 것

-- idf 고려한 것이 더 좋다. (1) idf 고려한 것 combined.tab : 62 (OOV: 0%)

(2) 일반 combined.tab : 60 (OOV: 0%)

jaekeol commented 5 years ago

enwiki 2019.03.12로 실험한 결과

(1) 일반에서 rw.txt 로 43이 나왔다.

jaekeol commented 5 years ago

analogy test

숫자가 논문과는 차이가 많이 나지만, semantic part 에서 더 긍정적으로 동작하였다.

(1) idf 고려한것 gram3-comparative 669 663 1332 0.502252 gram8-plural 849 483 1332 0.637387 capital-common-countries 151 355 506 0.298419 city-in-state 90 2377 2467 0.036482 family 218 288 506 0.430830 gram9-plural-verbs 519 351 870 0.596552 gram2-opposite 458 354 812 0.564039 currency 5 861 866 0.005774 gram4-superlative 534 588 1122 0.475936 gram6-nationality-adjective 1082 517 1599 0.676673 gram7-past-tense 395 1165 1560 0.253205 gram5-present-participle 468 588 1056 0.443182 capital-world 316 4208 4524 0.069850 gram1-adjective-to-adverb 602 390 992 0.606855 semantic 780 1 8869 0.087947 syntictic 5576 1 10675 0.522342 total 6356 13188 19544 0.325215

(2) original [irteam@csm0018 fastText]$ ./anal_eval2.py result/anal.predict ./data/wordanal/word-test.v1.txt gram3-comparative 674 658 1332 0.506006 gram8-plural 874 458 1332 0.656156 capital-common-countries 136 370 506 0.268775 city-in-state 91 2376 2467 0.036887 family 235 271 506 0.464427 gram9-plural-verbs 580 290 870 0.666667 gram2-opposite 470 342 812 0.578818 currency 2 864 866 0.002309 gram4-superlative 623 499 1122 0.555258 gram6-nationality-adjective 1132 467 1599 0.707942 gram7-past-tense 429 1131 1560 0.275000 gram5-present-participle 601 455 1056 0.569129 capital-world 286 4238 4524 0.063218 gram1-adjective-to-adverb 653 339 992 0.658266 semantic 750 1 8869 0.084564 syntictic 6036 1 10675 0.565433 total 6786 12758 19544 0.347217

jaekeol commented 5 years ago

200 dimension으로 올려서 했다.

(1) original rw.txt : 43 (OOV: 0%)

(2) idf rw.txt : 44 (OOV: 0%)

jaekeol commented 5 years ago

wiki20M ( 400Mega)

(1)origianl - 100dim Read 439M words Number of words: 644027 Number of labels: 0 Progress: 100.0% words/sec/thread: 80661 lr: 0.000000 loss: 0.471114 ETA: 0h 0m rw.txt : 42 (OOV: 0%)

(2) original - 300 dim Read 439M words Number of words: 644027 Number of labels: 0 Progress: 100.0% words/sec/thread: 30996 lr: 0.000000 loss: 0.486801 ETA: 0h 0m rw.txt : 45 (OOV: 0%)

(3) original - 600 dim Read 439M words Number of words: 644027 Number of labels: 0 Progress: 100.0% words/sec/thread: 18241 lr: 0.000000 loss: 0.478121 ETA: 0h 0m rw.txt : 45 (OOV: 0%)


(1) idf - 300 dim rw.txt : 45 (OOV: 0%)

데이터 잘못됨. 다시 실험해야함.

jaekeol commented 5 years ago

wiki.600K ( 746M) 으로 다시 실험함. -300DIM기준

(1) 300 dimension original

RW

Read 746M words Number of words: 692087 Number of labels: 0 Progress: 100.0% words/sec/thread: 25382 lr: 0.000000 loss: 0.218279 ETA: 0h 0m rw.txt : 45 (OOV: 0%)

WS353

combined.tab : 64 (OOV: 0%)

analogy

capital-common-countries 264 242 506 0.521739 capital-world 1030 3494 4524 0.227675 currency 16 850 866 0.018476 city-in-state 234 2233 2467 0.094852 family 285 221 506 0.563241 gram1-adjective-to-adverb 518 474 992 0.522177 gram2-opposite 423 389 812 0.520936 gram3-comparative 910 422 1332 0.683183 gram4-superlative 605 517 1122 0.539216 gram5-present-participle 552 504 1056 0.522727 gram6-nationality-adjective 1325 274 1599 0.828643 gram7-past-tense 478 1082 1560 0.306410 gram8-plural 962 370 1332 0.722222 gram9-plural-verbs 615 255 870 0.706897 semantic 1829 7040 8869 0.206224 syntictic 6388 4287 10675 0.598407 total 8217 11327 19544 0.420436

(2) 300 dimensional idf

RW

rw.txt : 46 (OOV: 0%)

WS353

combined.tab : 68 (OOV: 0%)

analogy

capital-common-countries 284 222 506 0.561265 capital-world 1035 3489 4524 0.228780 currency 22 844 866 0.025404 city-in-state 224 2243 2467 0.090799 family 269 237 506 0.531621 gram1-adjective-to-adverb 451 541 992 0.454637 gram2-opposite 427 385 812 0.525862 gram3-comparative 851 481 1332 0.638889 gram4-superlative 511 611 1122 0.455437 gram5-present-participle 514 542 1056 0.486742 gram6-nationality-adjective 1307 292 1599 0.817386 gram7-past-tense 465 1095 1560 0.298077 gram8-plural 841 491 1332 0.631381 gram9-plural-verbs 550 320 870 0.632184 semantic 1834 7035 8869 0.206788 syntictic 5917 4758 10675 0.554286 total 7751 11793 19544 0.396592

jaekeol commented 5 years ago

wiki.600K ( 746M) 으로 다시 실험함. -600 DIM기준

(1) 600 dimesion origianal

RW

rw.txt : 46 (OOV: 0%)

WS353

combined.tab : 62 (OOV: 0%)

anlogy

capital-common-countries 124 382 506 0.245059 capital-world 338 4186 4524 0.074713 currency 5 861 866 0.005774 city-in-state 32 2435 2467 0.012971 family 228 278 506 0.450593 gram1-adjective-to-adverb 501 491 992 0.505040 gram2-opposite 443 369 812 0.545567 gram3-comparative 806 526 1332 0.605105 gram4-superlative 528 594 1122 0.470588 gram5-present-participle 477 579 1056 0.451705 gram6-nationality-adjective 1145 454 1599 0.716073 gram7-past-tense 319 1241 1560 0.204487 gram8-plural 905 427 1332 0.679429 gram9-plural-verbs 621 249 870 0.713793 semantic 727 8142 8869 0.081971 syntictic 5745 4930 10675 0.538173 total 6472 13072 19544 0.331150

(2) 600 dimension idf

RW

rw.txt : 47 (OOV: 0%)

WS353

combined.tab : 69 (OOV: 0%)

analogy

capital-common-countries 113 393 506 0.223320 capital-world 324 4200 4524 0.071618 currency 7 859 866 0.008083 city-in-state 31 2436 2467 0.012566 family 233 273 506 0.460474 gram1-adjective-to-adverb 406 586 992 0.409274 gram2-opposite 428 384 812 0.527094 gram3-comparative 799 533 1332 0.599850 gram4-superlative 476 646 1122 0.424242 gram5-present-participle 471 585 1056 0.446023 gram6-nationality-adjective 1104 495 1599 0.690432 gram7-past-tense 343 1217 1560 0.219872 gram8-plural 850 482 1332 0.638138 gram9-plural-verbs 593 277 870 0.681609 semantic 708 8161 8869 0.079829 syntictic 5470 5205 10675 0.512412 total 6178 13366 19544 0.316107

jaekeol commented 5 years ago

wiki.600K ( 746M) 으로 다시 실험함. - 400 DIM기준

(1) 400 dimesion original

RW

Read 746M words fastText]$ Number of words: 692087 Number of labels: 0 Progress: 100.0% words/sec/thread: 22244 lr: 0.000000 loss: 0.219103 ETA: 0h 0m rw.txt : 46 (OOV: 0%)

WS353

combined.tab : 63 (OOV: 0%)

(2) 400 dimension idf

RW

rw.txt : 46 (OOV: 0%)

WS353

combined.tab : 68 (OOV: 0%)

jaekeol commented 5 years ago

wiki.600K ( 746M) - 340 DIM기준

(1) origianl

RW

rw.txt : 46 (OOV: 0%)

ws353

combined.tab : 63 (OOV: 0%)

analogy

(2) idf

rw

rw.txt : 46 (OOV: 0%)

ws353

combined.tab : 68 (OOV: 0%)

analogy

jaekeol commented 5 years ago

2-epoch, 127K, 300DIM

(1) original

RW

Read 124M words Number of words: 218317 Number of labels: 0 Progress: 100.0% words/sec/thread: 30573 lr: 0.000000 loss: 0.607745 ETA: 0h 0m rw.txt : 46 (OOV: 0%)

ws353

combined.tab : 66 (OOV: 0%)

(2) idf

rw

rw.txt : 46 (OOV: 0%)

ws353

combined.tab : 69 (OOV: 0%)

jaekeol commented 5 years ago

5-epoch, 127K, 300DIM

(1) original

RW

Read 124M words fastText]$ Number of words: 218317 Number of labels: 0 Progress: 100.0% words/sec/thread: 30623 lr: 0.000000 loss: 0.282995 ETA: 0h 0m rw.txt : 47 (OOV: 0%)

WS353

combined.tab : 70 (OOV: 0%)

analogy test

capital-common-countries 248 258 506 0.490119 capital-world 919 3605 4524 0.203139 currency 10 856 866 0.011547 city-in-state 190 2277 2467 0.077017 family 243 263 506 0.480237 gram1-adjective-to-adverb 534 458 992 0.538306 gram2-opposite 517 295 812 0.636700 gram3-comparative 1034 298 1332 0.776276 gram4-superlative 825 297 1122 0.735294 gram5-present-participle 741 315 1056 0.701705 gram6-nationality-adjective 1218 381 1599 0.761726 gram7-past-tense 426 1134 1560 0.273077 gram8-plural 1061 271 1332 0.796547 gram9-plural-verbs 706 164 870 0.811494 semantic 1610 7259 8869 0.181531 syntictic 7062 3613 10675 0.661546 total 8672 10872 19544 0.443717

(2) idf

RW

rw.txt : 48 (OOV: 0%)

WS353

combined.tab : 75 (OOV: 0%)

analogy

capital-common-countries 241 265 506 0.476285 capital-world 867 3657 4524 0.191645 currency 10 856 866 0.011547 city-in-state 178 2289 2467 0.072152 family 254 252 506 0.501976 gram1-adjective-to-adverb 449 543 992 0.452621 gram2-opposite 492 320 812 0.605911 gram3-comparative 1039 293 1332 0.780030 gram4-superlative 733 389 1122 0.653298 gram5-present-participle 637 419 1056 0.603220 gram6-nationality-adjective 1204 395 1599 0.752971 gram7-past-tense 420 1140 1560 0.269231 gram8-plural 1068 264 1332 0.801802 gram9-plural-verbs 700 170 870 0.804598 semantic 1550 7319 8869 0.174766 syntactic 6742 3933 10675 0.631569

jaekeol commented 5 years ago

3-epoch, 600K, 300D

(1) origianl

RW

rw.txt : 48 (OOV: 0%)

WS353

combined.tab : 70 (OOV: 0%)

analogy

capital-common-countries 384 122 506 0.758893 capital-world 2591 1933 4524 0.572723 currency 43 823 866 0.049654 city-in-state 763 1704 2467 0.309283 family 320 186 506 0.632411 gram1-adjective-to-adverb 491 501 992 0.494960 gram2-opposite 387 425 812 0.476601 gram3-comparative 1020 312 1332 0.765766 gram4-superlative 672 450 1122 0.598930 gram5-present-participle 601 455 1056 0.569129 gram6-nationality-adjective 1395 204 1599 0.872420 gram7-past-tense 633 927 1560 0.405769 gram8-plural 1085 247 1332 0.814565 gram9-plural-verbs 641 229 870 0.736782 semantic 4101 4768 8869 0.462397 syntactic 6925 3750 10675 0.648712 total 11026 8518 19544 0.564163

(2) idf

RW

rw.txt : 48 (OOV: 0%)

WS353

combined.tab : 71 (OOV: 0%)

analogy

capital-common-countries 398 108 506 0.786561 capital-world 2622 1902 4524 0.579576 currency 47 819 866 0.054273 city-in-state 758 1709 2467 0.307256 family 317 189 506 0.626482 gram1-adjective-to-adverb 443 549 992 0.446573 gram2-opposite 369 443 812 0.454433 gram3-comparative 1016 316 1332 0.762763 gram4-superlative 667 455 1122 0.594474 gram5-present-participle 601 455 1056 0.569129 gram6-nationality-adjective 1397 202 1599 0.873671 gram7-past-tense 710 850 1560 0.455128 gram8-plural 1035 297 1332 0.777027 gram9-plural-verbs 634 236 870 0.728736 semantic 4142 4727 8869 0.467020 syntactic 6872 3803 10675 0.643747 total 11014 8530 19544 0.563549

jaekeol commented 5 years ago

5-epoch, 600K, 300D

(1) original

RW

Read 746M words fastText]$ Number of words: 692087 Number of labels: 0 Progress: 100.0% words/sec/thread: 27744 lr: 0.000000 loss: 0.079676 ETA: 0h 0m rw.txt : 48 (OOV: 0%)

WS353

combined.tab : 71 (OOV: 0%)

analogy test

capital-common-countries 392 114 506 0.774704 capital-world 2636 1888 4524 0.582670 currency 41 825 866 0.047344 city-in-state 719 1748 2467 0.291447 family 321 185 506 0.634387 gram1-adjective-to-adverb 464 528 992 0.467742 gram2-opposite 385 427 812 0.474138 gram3-comparative 1032 300 1332 0.774775 gram4-superlative 648 474 1122 0.577540 gram5-present-participle 599 457 1056 0.567235 gram6-nationality-adjective 1401 198 1599 0.876173 gram7-past-tense 615 945 1560 0.394231 gram8-plural 1073 259 1332 0.805556 gram9-plural-verbs 638 232 870 0.733333 semantic 4109 4760 8869 0.463299 syntactic 6855 3820 10675 0.642155 total 10964 8580 19544 0.560991

capital-common-countries 474 32 506 0.936759 capital-world 3711 813 4524 0.820292 currency 21 845 866 0.024249 city-in-state 1511 956 2467 0.612485 family 347 159 506 0.685771 gram1-adjective-to-adverb 582 410 992 0.586694 gram2-opposite 291 521 812 0.358374 gram3-comparative 1089 243 1332 0.817568 gram4-superlative 550 572 1122 0.490196 gram5-present-participle 657 399 1056 0.622159 gram6-nationality-adjective 1443 156 1599 0.902439 gram7-past-tense 875 685 1560 0.560897 gram8-plural 1074 258 1332 0.806306 gram9-plural-verbs 573 297 870 0.658621 semantic 6064 2805 8869 0.683730 syntactic 7134 3541 10675 0.668290 total 13198 6346 19544 0.675297

(2) idf

RW

rw.txt : 48 (OOV: 0%)

WS353

combined.tab : 72 (OOV: 0%)

analogies

capital-common-countries 483 23 506 0.954545 capital-world 3742 782 4524 0.827144 currency 20 846 866 0.023095 city-in-state 1557 910 2467 0.631131 family 340 166 506 0.671937 gram1-adjective-to-adverb 555 437 992 0.559476 gram2-opposite 289 523 812 0.355911 gram3-comparative 1070 262 1332 0.803303 gram4-superlative 554 568 1122 0.493761 gram5-present-participle 648 408 1056 0.613636 gram6-nationality-adjective 1444 155 1599 0.903064 gram7-past-tense 909 651 1560 0.582692 gram8-plural 1047 285 1332 0.786036 gram9-plural-verbs 568 302 870 0.652874 semantic 6142 2727 8869 0.692525 syntactic 7084 3591 10675 0.663607 total 13226 6318 19544 0.676729

maygodwithu commented 5 years ago

prob-FT 127K, 300D, 5-epoch

0 SL 35.436715 -3.185108 34.707148 1 WS 65.856052 10.527625 66.295933 2 WS-S 71.503292 13.084014 72.033449 3 WS-R 61.930998 10.799741 62.624004 4 MEN 70.453948 14.965890 73.046392 5 MC 75.033380 16.622163 78.415667 6 RG 66.068560 10.918906 66.940587 7 YP 46.409053 2.458203 45.598771 8 MT-287 64.447636 24.872353 66.017415 9 MT-771 63.168677 4.908835 63.217692 10 RW 44.981086 -1.284775 44.077334

maygodwithu commented 5 years ago

prob-FT 127K, 300D, 10-epoch

Dataset sub sub2 sub-maxsim 0 SL 33.556303 15.353202 31.991434 1 WS 63.125912 40.110269 67.772385 2 WS-S 71.590585 43.392792 73.122145 3 WS-R 55.390777 39.109222 63.520950 4 MEN 71.239637 47.727900 76.243898 5 MC 69.692926 56.675569 71.984870 6 RG 71.844915 45.441532 76.526320 7 YP 45.374726 19.542526 42.995855 8 MT-287 67.854958 50.691588 66.752757 9 MT-771 63.110015 41.433156 66.020907 10 RW 40.784400 5.441490 39.294690

maygodwithu commented 5 years ago

prob-FT 127K 300D 15-epoch

Dataset sub sub2 sub-maxsim 0 SL 31.358466 24.091380 32.831386 1 WS 66.215226 48.762654 71.038251 2 WS-S 73.024429 57.235408 76.216087 3 WS-R 60.309807 45.810146 67.503253 4 MEN 72.558225 52.784504 76.275563 5 MC 73.943037 67.400981 78.971965 6 RG 71.818689 63.747528 75.079499 7 YP 57.500240 28.705667 57.024520 8 MT-287 67.967176 56.352351 70.493838 9 MT-771 61.980245 49.637925 66.835458 10 RW 40.315294 10.050497 38.544922

maygodwithu commented 5 years ago

prob-FT text8 300D 15-epoch

Result DataFrame Dataset sub sub2 sub-maxsim 0 SL 27.365994 10.608136 27.917852 1 WS 56.281286 39.428914 59.845036 2 WS-S 61.763048 39.304333 64.510194 3 WS-R 50.410482 42.855016 54.478300 4 MEN 65.531179 28.457075 68.079149 5 MC 48.575880 44.236761 58.767247 6 RG 51.838577 30.791927 56.659855 7 YP 39.134292 3.075357 37.375033 8 MT-287 65.587708 35.768994 66.335564 9 MT-771 53.522820 27.699246 57.821399 10 RW 36.149861 1.766194 34.798910

RW는 33에서 34로 좋아졌지만, WS는 66에서 떨어졌음

jaekeol commented 5 years ago

prob-FT 127K, 300D, 5-epoch

Dataset sub sub2 sub-maxsim 0 SL 31.415074 2.495994 31.259190 1 WS 51.063920 16.929660 54.027046 2 WS-S 57.413600 17.768538 59.857231 3 WS-R 46.909236 21.585441 50.404286 4 MEN 65.924690 23.079368 69.020333 5 MC 49.621719 24.877615 49.176681 6 RG 53.779327 29.620483 54.907061 7 YP 39.427548 8.549547 39.943754 8 MT-287 59.942252 35.811358 61.514569 9 MT-771 51.505170 21.642433 53.407299 10 RW 38.588679 -1.396556 37.779715

뭔가 잘못된것 같음. 확인요망.

maygodwithu commented 5 years ago

pft_ra, 127K 300D 5EP

1. idf 교정

SCWS_2003.txt : 64 (OOV: 0%) [irteam@csm0018 prob-fastText_ra]$ !v vi eval_ws.sh [irteam@csm0018 prob-fastText_ra]$ ./eval_ws.sh 30000 ngram read EN-WS-353-ALL.txt : 71 (OOV: 0%) [irteam@csm0018 prob-fastText_ra]$ vi eval_ws.sh [irteam@csm0018 prob-fastText_ra]$ ./eval_ws.sh 30000 ngram read EN-RW-STANFORD.txt : 42 (OOV: 0%) SCWS_2003.txt : 67 (OOV: 0%)

2. original

[irteam@csm0018 prob-fastText]$ ./eval_ws.sh SCWS_2003.txt : 63 (OOV: 0%) [irteam@csm0018 prob-fastText]$ !v vi eval_ws.sh [irteam@csm0018 prob-fastText]$ ./eval_ws.sh EN-WS-353-ALL.txt : 70 (OOV: 0%) [irteam@csm0018 prob-fastText]$ !v vi eval_ws.sh [irteam@csm0018 prob-fastText]$ ./eval_ws.sh EN-RW-STANFORD.txt : 40 (OOV: 0%) SCWS_2003.txt : 67 (OOV: 0%)

maygodwithu commented 5 years ago

600K, 300D

1. ft original

EN-RW-STANFORD.txt : 48 (OOV: 0%) EN-WS-353-ALL.txt : 71 (OOV: 0%) SCWS_2003.txt : 67 (OOV: 0%) simlex999.txt : 35 (OOV: 0%) EN-MC-30.txt : 76 (OOV: 0%) EN-RG-65.txt : 75 (OOV: 0%) EN-MEN-TR-3k.txt : 76 (OOV: 0%) EN-YP-130.txt : 55 (OOV: 0%)

2. ft idf

100000 ngram read EN-RW-STANFORD.txt : 48 (OOV: 0%) 100000 ngram read EN-WS-353-ALL.txt : 72 (OOV: 0%) 100000 ngram read SCWS_2003.txt : 67 (OOV: 0%) 100000 ngram read simlex999.txt : 36 (OOV: 0%) 100000 ngram read EN-MC-30.txt : 78 (OOV: 0%) 100000 ngram read EN-RG-65.txt : 78 (OOV: 0%) 100000 ngram read EN-MEN-TR-3k.txt : 76 (OOV: 0%) 100000 ngram read EN-YP-130.txt : 55 (OOV: 0%)

3. cbow

EN-RW-STANFORD.txt : 40 (OOV: 4%) EN-WS-353-ALL.txt : 69 (OOV: 0%) SCWS_2003.txt : 67 (OOV: 1%) simlex999.txt : 35 (OOV: 43%) EN-MC-30.txt : 76 (OOV: 0%) EN-RG-65.txt : 80 (OOV: 0%) EN-MEN-TR-3k.txt : 75 (OOV: 0%) EN-YP-130.txt : 42 (OOV: 0%)

4. skipgram

EN-RW-STANFORD.txt : 42 (OOV: 4%) EN-WS-353-ALL.txt : 72 (OOV: 0%) SCWS_2003.txt : 65 (OOV: 1%) simlex999.txt : 32 (OOV: 43%) EN-MC-30.txt : 78 (OOV: 0%) EN-RG-65.txt : 81 (OOV: 0%) EN-MEN-TR-3k.txt : 75 (OOV: 0%) EN-YP-130.txt : 50 (OOV: 0%)

maygodwithu commented 5 years ago

127K, 300D

  1. cbow EN-RW-STANFORD.txt : 38 (OOV: 23%) EN-WS-353-ALL.txt : 71 (OOV: 0%) SCWS_2003.txt : 66 (OOV: 1%) simlex999.txt : 35 (OOV: 43%) EN-MC-30.txt : 75 (OOV: 0%) EN-RG-65.txt : 76 (OOV: 0%) EN-MEN-TR-3k.txt : 73 (OOV: 0%) EN-YP-130.txt : 40 (OOV: 0%)

  2. skipgram EN-RW-STANFORD.txt : 35 (OOV: 23%) EN-WS-353-ALL.txt : 72 (OOV: 0%) SCWS_2003.txt : 65 (OOV: 1%) simlex999.txt : 35 (OOV: 43%) EN-MC-30.txt : 79 (OOV: 0%) EN-RG-65.txt : 76 (OOV: 0%) EN-MEN-TR-3k.txt : 75 (OOV: 0%) EN-YP-130.txt : 52 (OOV: 0%)

  3. ft EN-RW-STANFORD.txt : 47 (OOV: 0%) EN-WS-353-ALL.txt : 70 (OOV: 0%) SCWS_2003.txt : 67 (OOV: 0%) simlex999.txt : 38 (OOV: 0%) EN-MC-30.txt : 75 (OOV: 0%) EN-RG-65.txt : 74 (OOV: 0%) EN-MEN-TR-3k.txt : 75 (OOV: 0%) EN-YP-130.txt : 50 (OOV: 0%)

EN-RW-STANFORD.txt : 47.01 (OOV: 0.00%) EN-WS-353-ALL.txt : 70.11 (OOV: 0.00%) SCWS_2003.txt : 67.10 (OOV: 0.00%) simlex999.txt : 37.54 (OOV: 0.00%) EN-MC-30.txt : 75.10 (OOV: 0.00%) EN-RG-65.txt : 74.26 (OOV: 0.00%) EN-MEN-TR-3k.txt : 75.07 (OOV: 0.00%) EN-YP-130.txt : 50.00 (OOV: 0.00%)

  1. idft 30000 ngram read EN-RW-STANFORD.txt : 48 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 75 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67 (OOV: 0%) 30000 ngram read simlex999.txt : 38 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 80 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 75 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 76 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 56 (OOV: 0%)
maygodwithu commented 5 years ago

prob-ft 600K,300D

(1) origianl

EN-RW-STANFORD.txt : 42.50 (OOV: 0.00%) EN-WS-353-ALL.txt : 62.23 (OOV: 0.00%) SCWS_2003.txt : 62.08 (OOV: 0.00%) simlex999.txt : 31.73 (OOV: 0.00%) EN-MC-30.txt : 76.55 (OOV: 0.00%) EN-RG-65.txt : 76.29 (OOV: 0.00%) EN-MEN-TR-3k.txt : 70.71 (OOV: 0.00%) EN-YP-130.txt : 47.14 (OOV: 0.00%)

(2) idf

30000 ngram read EN-RW-STANFORD.txt : 41.57 (OOV: 0.00%) 30000 ngram read EN-WS-353-ALL.txt : 62.11 (OOV: 0.00%) 30000 ngram read SCWS_2003.txt : 62.17 (OOV: 0.00%) 30000 ngram read simlex999.txt : 31.31 (OOV: 0.00%) 30000 ngram read EN-MC-30.txt : 59.95 (OOV: 0.00%) 30000 ngram read EN-RG-65.txt : 72.77 (OOV: 0.00%) 30000 ngram read EN-MEN-TR-3k.txt : 71.36 (OOV: 0.00%) 30000 ngram read EN-YP-130.txt : 48.40 (OOV: 0.00%)

maygodwithu commented 5 years ago

idf-variance ( idf 값의 적용비율을 바꿔가면서 실험)

ratio 0.1

30000 ngram read EN-RW-STANFORD.txt : 47 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 68 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67 (OOV: 0%) 30000 ngram read simlex999.txt : 36 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 73 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49 (OOV: 0%)

ratio 0.2

30000 ngram read EN-RW-STANFORD.txt : 47 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 69 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67 (OOV: 0%) 30000 ngram read simlex999.txt : 37 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 73 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 47 (OOV: 0%)

ratio 0.3

30000 ngram read EN-RW-STANFORD.txt : 46.83 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 69.62 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.52 (OOV: 0%) 30000 ngram read simlex999.txt : 36.21 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.37 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 73.40 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.83 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 47.56 (OOV: 0%)

ratio 0.4

30000 ngram read EN-RW-STANFORD.txt : 46.92 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 70.08 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.57 (OOV: 0%) 30000 ngram read simlex999.txt : 36.86 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 76.90 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 71.99 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.73 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 47.53 (OOV: 0%)

ratio 0.5

30000 ngram read EN-RW-STANFORD.txt : 46.94 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 70.00 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.46 (OOV: 0%) 30000 ngram read simlex999.txt : 36.67 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.21 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.40 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.67 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 48.48 (OOV: 0%)

ratio 0.6

30000 ngram read EN-RW-STANFORD.txt : 47.10 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 69.79 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.65 (OOV: 0%) 30000 ngram read simlex999.txt : 36.58 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.93 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 71.72 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.51 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 48.20 (OOV: 0%)

ratio 0.7

30000 ngram read EN-RW-STANFORD.txt : 46.96 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 70.93 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.77 (OOV: 0%) 30000 ngram read simlex999.txt : 36.57 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 78.04 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 73.30 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.66 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.45 (OOV: 0%)

ratio 0.8

30000 ngram read EN-RW-STANFORD.txt : 47.13 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 70.74 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.86 (OOV: 0%) 30000 ngram read simlex999.txt : 37.62 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 78.33 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 73.54 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.52 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.16 (OOV: 0%)

ratio 1.0

30000 ngram read EN-RW-STANFORD.txt : 47.30 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 72.16 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.61 (OOV: 0%) 30000 ngram read simlex999.txt : 37.66 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.70 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 74.01 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.65 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 50.04 (OOV: 0%)

jaekeol commented 5 years ago

400K 300D 3EP

FT

EN-RW-STANFORD.txt : 46.10 (OOV: 0.00%) EN-WS-353-ALL.txt : 68.43 (OOV: 0.00%) SCWS_2003.txt : 65.32 (OOV: 0.00%) simlex999.txt : 32.86 (OOV: 0.00%) EN-MC-30.txt : 72.25 (OOV: 0.00%) EN-RG-65.txt : 75.98 (OOV: 0.00%) EN-MEN-TR-3k.txt : 73.50 (OOV: 0.00%) EN-YP-130.txt : 51.18 (OOV: 0.00%)

jaekeol commented 5 years ago

idf variance power

ratio power 1.2

30000 ngram read EN-RW-STANFORD.txt : 47.19 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 72.19 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.79 (OOV: 0%) 30000 ngram read simlex999.txt : 37.21 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 78.10 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 73.74 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.61 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.91 (OOV: 0%)

ratio 1.5

30000 ngram read EN-RW-STANFORD.txt : 47.16 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 71.77 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.62 (OOV: 0%) 30000 ngram read simlex999.txt : 37.29 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 76.68 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.77 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.63 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 50.53 (OOV: 0%)

ratio power 2.0

30000 ngram read EN-RW-STANFORD.txt : 47.20 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 72.10 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.78 (OOV: 0%) 30000 ngram read simlex999.txt : 37.55 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 76.86 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.75 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.25 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.33 (OOV: 0%)

ratio : 2.5

30000 ngram read EN-RW-STANFORD.txt : 47.02 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 72.20 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.60 (OOV: 0%) 30000 ngram read simlex999.txt : 37.56 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.99 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.53 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.43 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.87 (OOV: 0%)

ratio : 3

30000 ngram read EN-RW-STANFORD.txt : 46.93 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 72.46 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.85 (OOV: 0%) 30000 ngram read simlex999.txt : 38.19 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 75.86 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.45 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.57 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.09 (OOV: 0%)

ratio : 0.8

30000 ngram read EN-RW-STANFORD.txt : 47.31 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 71.79 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.82 (OOV: 0%) 30000 ngram read simlex999.txt : 36.98 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 76.86 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.45 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.76 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 49.34 (OOV: 0%)

ratio 0.6

30000 ngram read EN-RW-STANFORD.txt : 47.01 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 71.15 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.73 (OOV: 0%) 30000 ngram read simlex999.txt : 36.49 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 77.13 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 73.06 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.70 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 48.76 (OOV: 0%)

ratio 0.4

30000 ngram read EN-RW-STANFORD.txt : 46.81 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 70.85 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.42 (OOV: 0%) 30000 ngram read simlex999.txt : 36.82 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 78.33 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.57 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.75 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 48.79 (OOV: 0%)

ratio 0.1

30000 ngram read EN-RW-STANFORD.txt : 46.83 (OOV: 0%) 30000 ngram read EN-WS-353-ALL.txt : 68.83 (OOV: 0%) 30000 ngram read SCWS_2003.txt : 67.50 (OOV: 0%) 30000 ngram read simlex999.txt : 36.30 (OOV: 0%) 30000 ngram read EN-MC-30.txt : 76.72 (OOV: 0%) 30000 ngram read EN-RG-65.txt : 72.85 (OOV: 0%) 30000 ngram read EN-MEN-TR-3k.txt : 72.70 (OOV: 0%) 30000 ngram read EN-YP-130.txt : 47.06 (OOV: 0%)

maygodwithu commented 5 years ago

170K 300D 5EP idf weight lerning방식

word-sim-learn.sh

29795 ngram read EN-RW-STANFORD.txt : 47.62 (OOV: 0%) 29795 ngram read EN-WS-353-ALL.txt : 74.33 (OOV: 0%) 29795 ngram read SCWS_2003.txt : 67.57 (OOV: 0%) 29795 ngram read simlex999.txt : 38.35 (OOV: 0%) 29795 ngram read EN-MC-30.txt : 79.57 (OOV: 0%) 29795 ngram read EN-RG-65.txt : 74.90 (OOV: 0%) 29795 ngram read EN-MEN-TR-3k.txt : 75.84 (OOV: 0%) 29795 ngram read EN-YP-130.txt : 54.42 (OOV: 0%)

maygodwithu commented 5 years ago

600K 300D 5EP IDF learning 방식

word-sim-learn.sh

97536 ngram read EN-RW-STANFORD.txt : 47.71 (OOV: 0%) 97536 ngram read EN-WS-353-ALL.txt : 72.04 (OOV: 0%) 97536 ngram read SCWS_2003.txt : 67.09 (OOV: 0%) 97536 ngram read simlex999.txt : 37.43 (OOV: 0%) 97536 ngram read EN-MC-30.txt : 80.71 (OOV: 0%) 97536 ngram read EN-RG-65.txt : 82.34 (OOV: 0%) 97536 ngram read EN-MEN-TR-3k.txt : 77.25 (OOV: 0%) 97536 ngram read EN-YP-130.txt : 53.04 (OOV: 0%)

maygodwithu commented 5 years ago

재실험 skipgram 600K.300D.5epoch

sim test

EN-RW-STANFORD.txt : 42.12 (OOV: 4%) EN-WS-353-ALL.txt : 73.45 (OOV: 0%) SCWS_2003.txt : 67.14 (OOV: 1%) simlex999.txt : 35.91 (OOV: 43%) EN-MC-30.txt : 78.44 (OOV: 0%) EN-RG-65.txt : 81.62 (OOV: 0%) EN-MEN-TR-3k.txt : 75.82 (OOV: 0%) EN-YP-130.txt : 47.50 (OOV: 0%)

analogy test

ACCURACY TOP1: 96.05 % (486 / 506) Total accuracy: 96.05 % Semantic accuracy: 96.05 % Syntactic accuracy: -nan % capital-world:

ACCURACY TOP1: 88.96 % (3955 / 4446) Total accuracy: 89.68 % Semantic accuracy: 89.68 % Syntactic accuracy: -nan % currency: ACCURACY TOP1: 16.61 % (99 / 596) Total accuracy: 81.83 % Semantic accuracy: 81.83 % Syntactic accuracy: -nan % city-in-state: ACCURACY TOP1: 70.81 % (1747 / 2467) Total accuracy: 78.44 % Semantic accuracy: 78.44 % Syntactic accuracy: -nan % family: ACCURACY TOP1: 80.09 % (370 / 462) Total accuracy: 78.53 % Semantic accuracy: 78.53 % Syntactic accuracy: -nan % gram1-adjective-to-adverb: ACCURACY TOP1: 33.17 % (329 / 992) Total accuracy: 73.78 % Semantic accuracy: 78.53 % Syntactic accuracy: 33.17 % gram2-opposite: ACCURACY TOP1: 35.84 % (291 / 812) Total accuracy: 70.78 % Semantic accuracy: 78.53 % Syntactic accuracy: 34.37 % gram3-comparative: ACCURACY TOP1: 83.18 % (1108 / 1332) Total accuracy: 72.20 % Semantic accuracy: 78.53 % Syntactic accuracy: 55.10 % gram4-superlative: ACCURACY TOP1: 58.06 % (540 / 930) Total accuracy: 71.16 % Semantic accuracy: 78.53 % Syntactic accuracy: 55.78 % gram5-present-participle: ACCURACY TOP1: 56.25 % (594 / 1056) Total accuracy: 70.00 % Semantic accuracy: 78.53 % Syntactic accuracy: 55.88 % gram6-nationality-adjective: ACCURACY TOP1: 90.68 % (1450 / 1599) Total accuracy: 72.17 % Semantic accuracy: 78.53 % Syntactic accuracy: 64.16 % gram7-past-tense: ACCURACY TOP1: 60.64 % (946 / 1560) Total accuracy: 71.10 % Semantic accuracy: 78.53 % Syntactic accuracy: 63.49 % gram8-plural: ACCURACY TOP1: 80.11 % (1067 / 1332) Total accuracy: 71.76 % Semantic accuracy: 78.53 % Syntactic accuracy: 65.80 % gram9-plural-verbs: ACCURACY TOP1: 64.60 % (562 / 870) Total accuracy: 71.43 % Semantic accuracy: 78.53 % Syntactic accuracy: 65.70 % Questions seen / total: 18960 19544 97.01 %