stephenroller / naacl2016

Models for the Lexical Substitution in Context task. If you need this, please email me.
MIT License
4 stars 0 forks source link

GAP is calculated incorrectly resulting in incorrect values in Table1 #1

Open alexey-pismenny opened 5 years ago

alexey-pismenny commented 5 years ago

In LexsubData.init the following line: candidates[target] = scrub_candidates(right.split(";"), target) splits candidates by ';', but doesn't strip whitespaces. In scrub_substitutes(): remove_mwe = ((k, v) for k, v in before_iter if ' ' not in k) removes most of candidates, not only MWE, because even single word candidates start with a whitespace.

For instance, the candiates for album.n are: "album.n:: music release; miscellany; register; music; vocal; set of records; musical compilation; musical recording; recording; platter; complilation; troop; vinyl recording; memory book; depository; musical portfolio; musical collection; compact disc; music compilation; piece; memento; portfolio; vinyl disc; cd; cds; musical work; disc; musical offering; disk; melody; music album; work; muscial offering; index; set of songs; anthology; group; release; phonograph; record album; musical endeavor; production; set of cds; tape; treasury; lp; collection of songs; scrapbook; folder; collection of work; composition; hit; song; chorus; session; track; compilation; picture; collection; band;record; notebook; recordings; vinyl; note book; records"

After scrub_candidates() only 'record' survives because it happened to appear without a whitespace before ';'.

alexey-pismenny commented 5 years ago

This holds true only for TWSI2 where spaces appear after ';'

stephenroller commented 5 years ago

Thanks for filing. You should probably fork and fix for yourself, reporting the corrected results in your next paper as a corrected baseline.

Given the age of this paper, it's probably more reasonable to keep this code as forever frozen as ground truth for what I did in the paper.