Closed wejradford closed 10 years ago
I've implemented this in prepare
in branch:norm, this also contains reference analysis and eval files (which I think we should be doing). This didn't affect the results.
There are a couple of changes in results I've noticed (not counting strong_linked_mention_match
) that have changed since 6th March, but I haven't checked what has caused this.
$ diff tagme.eval ../old_outputs/tagme.eval
2,4c2,3
< 2894 709 2722 0.803 0.515 0.628 strong_mention_match
< 2277 1326 2208 0.632 0.508 0.563 strong_link_match
---
> 2888 715 2728 0.802 0.514 0.627 strong_mention_match
> 2276 1327 2209 0.632 0.507 0.563 strong_link_match
@benhachey / @jnothman: we may want to report the updated results.
I had a quick look and not sure what's changed. Feel free to merge this change into master.
Hmm, the old file had some more noise in it, I think... (left=old, right=new):
1102,1104c1150,1154
< Friday B Friday:<s/>Men Friday_Men 0.5
< : I Friday:<s/>Men Friday_Men 0.5
< Men I Friday:<s/>Men Friday_Men 0.5
---
> Friday B Friday : Friday_Men 0.5
> : I Friday : Friday_Men 0.5
>
> Men
>
1738,1739c1869,1870
< ENGLISH B ENGLISH F.A The_Football_Association 0.3785
< F.A. I ENGLISH F.A The_Football_Association 0.3785
---
> ENGLISH B ENGLISH F.A. The_Football_Association 0.3785
> F.A. I ENGLISH F.A. The_Football_Association 0.3785
2086,2087c2234,2236
< EuroLeague B EuroLeague<s/>basketball Euroleague_Basketball 0.4031
< basketball I EuroLeague<s/>basketball Euroleague_Basketball 0.4031
---
> EuroLeague B EuroLeague Euroleague_Basketball 0.4031
>
> basketball
4597,4599c4897,4901
< singles B singles<s/>Group B 2010_ATP_World_Tour_Finals_–_Singles 0.5657
< Group I singles<s/>Group B 2010_ATP_World_Tour_Finals_–_Singles 0.5657
< B I singles<s/>Group B 2010_ATP_World_Tour_Finals_–_Singles 0.5657
---
> singles B singles 2010_ATP_World_Tour_Finals_–_Singles 0.5657
>
> Group
> B
>
5906,5907c6355,6357
< PORTLAND B PORTLAND<s/>INDIANA Portland,_Indiana 0.5845
< INDIANA I PORTLAND<s/>INDIANA Portland,_Indiana 0.5845
---
> PORTLAND B PORTLAND Portland,_Indiana 0.5845
>
> INDIANA
6247,6249c6760,6763
< SAN B SAN FRANCISCO<s/>MINNESOTA San_Francisco,_Minnesota 0.5
< FRANCISCO I SAN FRANCISCO<s/>MINNESOTA San_Francisco,_Minnesota 0.5
< MINNESOTA I SAN FRANCISCO<s/>MINNESOTA San_Francisco,_Minnesota 0.5
---
> SAN B SAN FRANCISCO San_Francisco,_Minnesota 0.5
> FRANCISCO I SAN FRANCISCO San_Francisco,_Minnesota 0.5
>
> MINNESOTA
7040,7042c7637,7640
< French B French first<s/>division Ligue_1 0.4083
< first I French first<s/>division Ligue_1 0.4083
< division I French first<s/>division Ligue_1 0.4083
---
> French B French first Ligue_1 0.4083
> first I French first Ligue_1 0.4083
>
> division
7830,7831c8515,8517
< Barcelona B Barcelona<s/>9-2 1992_Summer_Olympics 0.3649
< 9-2 I Barcelona<s/>9-2 1992_Summer_Olympics 0.3649
---
> Barcelona B Barcelona 1992_Summer_Olympics 0.3649
>
> 9-2
8586,8588c9337,9340
< AUSTRIA B AUSTRIA.<s/>VIENNA FK_Austria_Wien 0.3787
< . I AUSTRIA.<s/>VIENNA FK_Austria_Wien 0.3787
< VIENNA I AUSTRIA.<s/>VIENNA FK_Austria_Wien 0.3787
---
> AUSTRIA B AUSTRIA . FK_Austria_Wien 0.3787
> . I AUSTRIA . FK_Austria_Wien 0.3787
>
> VIENNA
12639,12640c13581,13582
< AMR B AMR Corp AMR_Corporation 0.6375
< Corp. I AMR Corp AMR_Corporation 0.6375
---
> AMR B AMR Corp. AMR_Corporation 0.6375
> Corp. I AMR Corp. AMR_Corporation 0.6375
15319,15321c16406,16409
< OAKLAND B OAKLAND, N.J Oakland,_New_Jersey 0.5205
< , I OAKLAND, N.J Oakland,_New_Jersey 0.5205
< N.J. I OAKLAND, N.J Oakland,_New_Jersey 0.5205
---
>
> OAKLAND B OAKLAND , N.J. Oakland,_New_Jersey 0.5205
> , I OAKLAND , N.J. Oakland,_New_Jersey 0.5205
> N.J. I OAKLAND , N.J. Oakland,_New_Jersey 0.5205
22952c24387
< 1.233 B 233 Oregon_Route_233 0.3125
---
> 1.233 B 1.233 Oregon_Route_233 0.3125
23262,23263c24708,24709
< in B in U.S .us 0.51
< U.S. I in U.S .us 0.51
---
> in B in U.S. .us 0.51
> U.S. I in U.S. .us 0.51
24605,24608c26133,26137
< NJ B NJ-- U.S .us 0.5
< -- I NJ-- U.S .us 0.5
< U.S. I NJ-- U.S .us 0.5
< Municipal
---
> NJ B NJ .us 0.5
>
> --
> U.S.
> Municipal
26297c27941
< 217,092 B 217 Oregon_Route_217 0.3323
---
> 217,092 B 217,092 Oregon_Route_217 0.3323
26300c27945
< 285,505 B 505 U.S._Route_30_in_Oregon 0.4492
---
> 285,505 B 285,505 U.S._Route_30_in_Oregon 0.4492
26316c27967,27968
< 223,172 B 223 Oregon_Route_223 0.2978
---
> 223,172 B 223,172 Oregon_Route_223 0.2978
>
28378c30131
< http://www.pt.lu/infoweb/kreschtmaart B http Hypertext_Transfer_Protocol 0.3338
---
> http://www.pt.lu/infoweb/kreschtmaart B http://www.pt.lu/infoweb/kreschtmaart Hypertext_Transfer_Protocol 0.3338
30423,30424c32293,32295
< 6.10 B 10 6 10_from_6 0.5
< 6.05 I 10 6 10_from_6 0.5
---
> 6.10 B 6.10 6.05 10_from_6 0.5
> 6.05 I 6.10 6.05 10_from_6 0.5
>
31595,31597c33519,33522
< Internet B Internet.<s/>Mongolia Telecommunications_in_Mongolia 0.5783
< . I Internet.<s/>Mongolia Telecommunications_in_Mongolia 0.5783
< Mongolia I Internet.<s/>Mongolia Telecommunications_in_Mongolia 0.5783
---
> Internet B Internet . Telecommunications_in_Mongolia 0.5783
> . I Internet . Telecommunications_in_Mongolia 0.5783
>
> Mongolia
32978c34964
< 198,226 B 226 Oregon_Route_226 0.3209
---
> 198,226 B 198,226 Oregon_Route_226 0.3209
36094,36096c38236,38239
< target B target.<s/>TOKYO Target_Tokyo 0.375
< . I target.<s/>TOKYO Target_Tokyo 0.375
< TOKYO I target.<s/>TOKYO Target_Tokyo 0.375
---
> target B target . Target_Tokyo 0.375
> . I target . Target_Tokyo 0.375
>
> TOKYO
36338c38491
< then-U.S. B then-U.S Thenus 0.5055
---
> then-U.S. B then-U.S. Thenus 0.5055
38129c40414,40415
< Austria)118 B Austria Austria 0.4978
---
> Austria)118 B Austria)118 Austria 0.4978
>
39855,39856c42313,42315
< 157.10 B 10<s/>5 10_from_5 0.5
< 5. I 10<s/>5 10_from_5 0.5
---
> 157.10 B 157.10 10_from_5 0.5
>
> 5.
39981c42453
< 124.4 B 124 Ontario_Highway_124 0.2963
---
> 124.4 B 124.4 Ontario_Highway_124 0.2963
40894,40896c43459,43462
< English B English<s/>premier league Premier_League 0.4084
< premier I English<s/>premier league Premier_League 0.4084
< league I English<s/>premier league Premier_League 0.4084
---
> English B English Premier_League 0.4084
>
> premier
> league
45143c48153
< .647 B 647 List_of_Air_Ministry_specifications 0.5113
---
> .647 B .647 List_of_Air_Ministry_specifications 0.5113
45193c48213
< .647 B 647 List_of_Air_Ministry_specifications 0.5
---
> .647 B .647 List_of_Air_Ministry_specifications 0.5
45268c48303
< .158 B 158 Antonov_An-148 0.308
---
> .158 B .158 Antonov_An-148 0.308
46017,46018c49150,49151
< 's B s Jose San_José,_Costa_Rica 0.3
< Jose I s Jose San_José,_Costa_Rica 0.3
---
> 's B 's Jose San_José,_Costa_Rica 0.3
> Jose I 's Jose San_José,_Costa_Rica 0.3
46085,46087c49223,49226
< REAL B REAL.<s/>MADRID Real_Madrid_C.F. 0.549
< . I REAL.<s/>MADRID Real_Madrid_C.F. 0.549
< MADRID I REAL.<s/>MADRID Real_Madrid_C.F. 0.549
---
> REAL B REAL . Real_Madrid_C.F. 0.549
> . I REAL . Real_Madrid_C.F. 0.549
>
> MADRID
Initially, we were only normalising entity_ids when reading gold-standard, but this is a nice feature for system output.
There are two choices:
prepare
: this is consistent with other changes we make to the data (filtering, mapping).