wikilinks / conll03_nel_eval

Python evaluation scripts for AIDA-formatted CoNLL data
Apache License 2.0
20 stars 4 forks source link

entity_id normalisation should be consistent #33

Closed wejradford closed 10 years ago

wejradford commented 10 years ago

Initially, we were only normalising entity_ids when reading gold-standard, but this is a nice feature for system output.

There are two choices:

wejradford commented 10 years ago

I've implemented this in prepare in branch:norm, this also contains reference analysis and eval files (which I think we should be doing). This didn't affect the results.

There are a couple of changes in results I've noticed (not counting strong_linked_mention_match) that have changed since 6th March, but I haven't checked what has caused this.

$ diff tagme.eval ../old_outputs/tagme.eval
2,4c2,3
< 2894  709 2722    0.803   0.515   0.628   strong_mention_match
< 2277  1326    2208    0.632   0.508   0.563   strong_link_match
---
> 2888  715 2728    0.802   0.514   0.627   strong_mention_match
> 2276  1327    2209    0.632   0.507   0.563   strong_link_match

@benhachey / @jnothman: we may want to report the updated results.

wejradford commented 10 years ago

I had a quick look and not sure what's changed. Feel free to merge this change into master.

wejradford commented 10 years ago

Hmm, the old file had some more noise in it, I think... (left=old, right=new):

1102,1104c1150,1154
< Friday    B   Friday:<s/>Men  Friday_Men  0.5
< : I   Friday:<s/>Men  Friday_Men  0.5
< Men   I   Friday:<s/>Men  Friday_Men  0.5
---
> Friday    B   Friday :    Friday_Men  0.5
> : I   Friday :    Friday_Men  0.5
> 
> Men               
> 
1738,1739c1869,1870
< ENGLISH   B   ENGLISH F.A The_Football_Association    0.3785
< F.A.  I   ENGLISH F.A The_Football_Association    0.3785
---
> ENGLISH   B   ENGLISH F.A.    The_Football_Association    0.3785
> F.A.  I   ENGLISH F.A.    The_Football_Association    0.3785
2086,2087c2234,2236
< EuroLeague    B   EuroLeague<s/>basketball    Euroleague_Basketball   0.4031
< basketball    I   EuroLeague<s/>basketball    Euroleague_Basketball   0.4031
---
> EuroLeague    B   EuroLeague  Euroleague_Basketball   0.4031
> 
> basketball                
4597,4599c4897,4901
< singles   B   singles<s/>Group B  2010_ATP_World_Tour_Finals_–_Singles  0.5657
< Group I   singles<s/>Group B  2010_ATP_World_Tour_Finals_–_Singles  0.5657
< B I   singles<s/>Group B  2010_ATP_World_Tour_Finals_–_Singles  0.5657
---
> singles   B   singles 2010_ATP_World_Tour_Finals_–_Singles  0.5657
> 
> Group             
> B             
> 
5906,5907c6355,6357
< PORTLAND  B   PORTLAND<s/>INDIANA Portland,_Indiana   0.5845
< INDIANA   I   PORTLAND<s/>INDIANA Portland,_Indiana   0.5845
---
> PORTLAND  B   PORTLAND    Portland,_Indiana   0.5845
> 
> INDIANA               
6247,6249c6760,6763
< SAN   B   SAN FRANCISCO<s/>MINNESOTA  San_Francisco,_Minnesota    0.5
< FRANCISCO I   SAN FRANCISCO<s/>MINNESOTA  San_Francisco,_Minnesota    0.5
< MINNESOTA I   SAN FRANCISCO<s/>MINNESOTA  San_Francisco,_Minnesota    0.5
---
> SAN   B   SAN FRANCISCO   San_Francisco,_Minnesota    0.5
> FRANCISCO I   SAN FRANCISCO   San_Francisco,_Minnesota    0.5
> 
> MINNESOTA             
7040,7042c7637,7640
< French    B   French first<s/>division    Ligue_1 0.4083
< first I   French first<s/>division    Ligue_1 0.4083
< division  I   French first<s/>division    Ligue_1 0.4083
---
> French    B   French first    Ligue_1 0.4083
> first I   French first    Ligue_1 0.4083
> 
> division              
7830,7831c8515,8517
< Barcelona B   Barcelona<s/>9-2    1992_Summer_Olympics    0.3649
< 9-2   I   Barcelona<s/>9-2    1992_Summer_Olympics    0.3649
---
> Barcelona B   Barcelona   1992_Summer_Olympics    0.3649
> 
> 9-2               
8586,8588c9337,9340
< AUSTRIA   B   AUSTRIA.<s/>VIENNA  FK_Austria_Wien 0.3787
< . I   AUSTRIA.<s/>VIENNA  FK_Austria_Wien 0.3787
< VIENNA    I   AUSTRIA.<s/>VIENNA  FK_Austria_Wien 0.3787
---
> AUSTRIA   B   AUSTRIA .   FK_Austria_Wien 0.3787
> . I   AUSTRIA .   FK_Austria_Wien 0.3787
> 
> VIENNA                
12639,12640c13581,13582
< AMR   B   AMR Corp    AMR_Corporation 0.6375
< Corp. I   AMR Corp    AMR_Corporation 0.6375
---
> AMR   B   AMR Corp.   AMR_Corporation 0.6375
> Corp. I   AMR Corp.   AMR_Corporation 0.6375
15319,15321c16406,16409
< OAKLAND   B   OAKLAND, N.J    Oakland,_New_Jersey 0.5205
< , I   OAKLAND, N.J    Oakland,_New_Jersey 0.5205
< N.J.  I   OAKLAND, N.J    Oakland,_New_Jersey 0.5205
---
> 
> OAKLAND   B   OAKLAND , N.J.  Oakland,_New_Jersey 0.5205
> , I   OAKLAND , N.J.  Oakland,_New_Jersey 0.5205
> N.J.  I   OAKLAND , N.J.  Oakland,_New_Jersey 0.5205
22952c24387
< 1.233 B   233 Oregon_Route_233    0.3125
---
> 1.233 B   1.233   Oregon_Route_233    0.3125
23262,23263c24708,24709
< in    B   in U.S  .us 0.51
< U.S.  I   in U.S  .us 0.51
---
> in    B   in U.S. .us 0.51
> U.S.  I   in U.S. .us 0.51
24605,24608c26133,26137
< NJ    B   NJ-- U.S    .us 0.5
< --    I   NJ-- U.S    .us 0.5
< U.S.  I   NJ-- U.S    .us 0.5
< Municipal             
---
> NJ    B   NJ  .us 0.5
> 
> --                
> U.S.              
> Municipal             
26297c27941
< 217,092   B   217 Oregon_Route_217    0.3323
---
> 217,092   B   217,092 Oregon_Route_217    0.3323
26300c27945
< 285,505   B   505 U.S._Route_30_in_Oregon 0.4492
---
> 285,505   B   285,505 U.S._Route_30_in_Oregon 0.4492
26316c27967,27968
< 223,172   B   223 Oregon_Route_223    0.2978
---
> 223,172   B   223,172 Oregon_Route_223    0.2978
> 
28378c30131
< http://www.pt.lu/infoweb/kreschtmaart B   http    Hypertext_Transfer_Protocol 0.3338
---
> http://www.pt.lu/infoweb/kreschtmaart B   http://www.pt.lu/infoweb/kreschtmaart   Hypertext_Transfer_Protocol 0.3338
30423,30424c32293,32295
< 6.10  B   10 6    10_from_6   0.5
< 6.05  I   10 6    10_from_6   0.5
---
> 6.10  B   6.10 6.05   10_from_6   0.5
> 6.05  I   6.10 6.05   10_from_6   0.5
> 
31595,31597c33519,33522
< Internet  B   Internet.<s/>Mongolia   Telecommunications_in_Mongolia  0.5783
< . I   Internet.<s/>Mongolia   Telecommunications_in_Mongolia  0.5783
< Mongolia  I   Internet.<s/>Mongolia   Telecommunications_in_Mongolia  0.5783
---
> Internet  B   Internet .  Telecommunications_in_Mongolia  0.5783
> . I   Internet .  Telecommunications_in_Mongolia  0.5783
> 
> Mongolia              
32978c34964
< 198,226   B   226 Oregon_Route_226    0.3209
---
> 198,226   B   198,226 Oregon_Route_226    0.3209
36094,36096c38236,38239
< target    B   target.<s/>TOKYO    Target_Tokyo    0.375
< . I   target.<s/>TOKYO    Target_Tokyo    0.375
< TOKYO I   target.<s/>TOKYO    Target_Tokyo    0.375
---
> target    B   target .    Target_Tokyo    0.375
> . I   target .    Target_Tokyo    0.375
> 
> TOKYO             
36338c38491
< then-U.S. B   then-U.S    Thenus  0.5055
---
> then-U.S. B   then-U.S.   Thenus  0.5055
38129c40414,40415
< Austria)118   B   Austria Austria 0.4978
---
> Austria)118   B   Austria)118 Austria 0.4978
> 
39855,39856c42313,42315
< 157.10    B   10<s/>5 10_from_5   0.5
< 5.    I   10<s/>5 10_from_5   0.5
---
> 157.10    B   157.10  10_from_5   0.5
> 
> 5.                
39981c42453
< 124.4 B   124 Ontario_Highway_124 0.2963
---
> 124.4 B   124.4   Ontario_Highway_124 0.2963
40894,40896c43459,43462
< English   B   English<s/>premier league   Premier_League  0.4084
< premier   I   English<s/>premier league   Premier_League  0.4084
< league    I   English<s/>premier league   Premier_League  0.4084
---
> English   B   English Premier_League  0.4084
> 
> premier               
> league                
45143c48153
< .647  B   647 List_of_Air_Ministry_specifications 0.5113
---
> .647  B   .647    List_of_Air_Ministry_specifications 0.5113
45193c48213
< .647  B   647 List_of_Air_Ministry_specifications 0.5
---
> .647  B   .647    List_of_Air_Ministry_specifications 0.5
45268c48303
< .158  B   158 Antonov_An-148  0.308
---
> .158  B   .158    Antonov_An-148  0.308
46017,46018c49150,49151
< 's    B   s Jose  San_José,_Costa_Rica   0.3
< Jose  I   s Jose  San_José,_Costa_Rica   0.3
---
> 's    B   's Jose San_José,_Costa_Rica   0.3
> Jose  I   's Jose San_José,_Costa_Rica   0.3
46085,46087c49223,49226
< REAL  B   REAL.<s/>MADRID Real_Madrid_C.F.    0.549
< . I   REAL.<s/>MADRID Real_Madrid_C.F.    0.549
< MADRID    I   REAL.<s/>MADRID Real_Madrid_C.F.    0.549
---
> REAL  B   REAL .  Real_Madrid_C.F.    0.549
> . I   REAL .  Real_Madrid_C.F.    0.549
> 
> MADRID