wikilinks / conll03_nel_eval

Python evaluation scripts for AIDA-formatted CoNLL data
Apache License 2.0
20 stars 4 forks source link

Integrate TAC NIL clustering measures #39

Open benhachey opened 10 years ago

benhachey commented 10 years ago

First cut at 3497f5e1fc58f6ea76368de12b2f0dbaca00725a

I will circulate after tidying and adding tests.

jnothman commented 10 years ago

Reviewing your changes to the output of coref metrics. You assume that all metrics can be calculated as a function of tp, fp, fn. I don't think this is true of the Cai and Strube variants which account for differences in extracted mentions (i.e. "twinless mentions"); I'm not sure if it's true of entity ceaf (haven't cogitated sufficiently); and it's certainly untrue of BLANC and its family, since BLANC calculates the macro average of the coreferent and non-coreferent precisions, recalls and F1s. We could treat BLANC, at least, differently, but for the others it might be better to allow a different numerator for each of precision and recall.

benhachey commented 10 years ago

I have updated the matrix class in evaluate as discussed. Coreference metrics now return p_num, p_den, r_num, r_den. Call _prf on this output to get precision, recall and f-score.

jnothman commented 10 years ago

Won't work for BLANC, but is a reasonable generalisation of the others.

On 25 May 2014 14:52, Ben Hachey notifications@github.com wrote:

I have updated the matrix class in evaluate as discussed. Coreference metrics now return p_num, p_den, r_num, r_den. Call _prf on this output to get precision, recall and f-score.

— Reply to this email directly or view it on GitHubhttps://github.com/wikilinks/conll03_nel_eval/issues/39#issuecomment-44111456 .