wikilinks / conll03_nel_eval

Python evaluation scripts for AIDA-formatted CoNLL data
Apache License 2.0
20 stars 4 forks source link

References directories to compare apples and apples #53

Open jnothman opened 10 years ago

jnothman commented 10 years ago

I propose that under references/ we divide the system outputs into directories representing the different task settings. I propose that we split references into:

There's still the potential for the entries in the directories not to be altogether comparable with one another. For example, we could subdivide system-mentions into those that generate NEs only (schwa), and those that include other wikilinks (tagme); we could subdivide gold-mentions according to whether the system had access to CoNLL 2003 type annotations (although this may be harder to infer).

There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.

benhachey commented 10 years ago

Also:

jnothman commented 10 years ago

I still don't see the difference between that and the setting where a system's input is those mentions in the gold that are linked... assuming this version of the gold, which for now is all we have.

On 23 June 2014 21:27, Ben Hachey notifications@github.com wrote:

Also:

  • references/gold-linked-aidacandidates: Same as references/gold-linked-mentions, uses YAGO means/label relationships for candidate generation. I.e., the precise Hoffart et al. (2011) task setting.

— Reply to this email directly or view it on GitHub https://github.com/wikilinks/conll03_nel_eval/issues/53#issuecomment-46922515 .

wejradford commented 10 years ago

I agree with the first structure points.

I think we keep the means dataset, as the goal is to demystify the evaluation (and its knobs and levers).

There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.

I favour putting in conll or similar, but am not sure about ID mappings. They're nice regression test fodder, but we shouldn't really need them as a user can run the appropriate commands to generate.

benhachey commented 10 years ago

@jnothman - The difference is in the candidates (not the mentions).

On Tue, Jun 24, 2014 at 2:34 PM, jnothman notifications@github.com wrote:

I still don't see the difference between that and the setting where a system's input is those mentions in the gold that are linked... assuming this version of the gold, which for now is all we have.