Open jnothman opened 10 years ago
Also:
references/gold-linked-aidacandidates
: Same as references/gold-linked-mentions
, but uses aida_means.tsv.bz2 for candidate generation. I.e., the precise Hoffart et al. (2011) task setting.I still don't see the difference between that and the setting where a system's input is those mentions in the gold that are linked... assuming this version of the gold, which for now is all we have.
On 23 June 2014 21:27, Ben Hachey notifications@github.com wrote:
Also:
- references/gold-linked-aidacandidates: Same as references/gold-linked-mentions, uses YAGO means/label relationships for candidate generation. I.e., the precise Hoffart et al. (2011) task setting.
— Reply to this email directly or view it on GitHub https://github.com/wikilinks/conll03_nel_eval/issues/53#issuecomment-46922515 .
I agree with the first structure points.
I think we keep the means
dataset, as the goal is to demystify the evaluation (and its knobs and levers).
There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.
I favour putting in conll
or similar, but am not sure about ID mappings. They're nice regression test fodder, but we shouldn't really need them as a user can run the appropriate commands to generate.
@jnothman - The difference is in the candidates (not the mentions).
On Tue, Jun 24, 2014 at 2:34 PM, jnothman notifications@github.com wrote:
I still don't see the difference between that and the setting where a system's input is those mentions in the gold that are linked... assuming this version of the gold, which for now is all we have.
I propose that under
references/
we divide the system outputs into directories representing the different task settings. I propose that we splitreferences
into:references/gold-mentions
: the system attempted to link all (including NILs) gold mentions (?schwa-linkable)references/gold-linked-mentions
: the system attempted to link only gold linked mentions (aida, houlsby)There's still the potential for the entries in the directories not to be altogether comparable with one another. For example, we could subdivide
system-mentions
into those that generate NEs only (schwa), and those that include other wikilinks (tagme); we could subdividegold-mentions
according to whether the system had access to CoNLL 2003 type annotations (although this may be harder to infer).There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.