Should be able to evaluate (the micro-average over documents of) within-document coreference resolution performance. With the current implementation the following approaches exist:
append document ID to entity ID manually (or using prepare-conll-coref)
score each document individually by splitting the input, then aggregate
Note that the former approach breaks for the pairwise_negative aggregate, as true negatives from across the corpus will be counted.
My current preferred solution is to add an option to evaluate: which fields to break the calculation down by, ordinarily 'doc' but perhaps also 'type' would be of interest. Evaluate would then calculate all measures over each, then add results for micro-average and macro-average. This would also mean we can rename the aggregate sets-micro to sets.
Thanks for expressing the need for this, @shyamupa
Should be able to evaluate (the micro-average over documents of) within-document coreference resolution performance. With the current implementation the following approaches exist:
prepare-conll-coref
)Note that the former approach breaks for the
pairwise_negative
aggregate, as true negatives from across the corpus will be counted.My current preferred solution is to add an option to
evaluate
: which fields to break the calculation down by, ordinarily 'doc' but perhaps also 'type' would be of interest. Evaluate would then calculate all measures over each, then add results for micro-average and macro-average. This would also mean we can rename the aggregatesets-micro
tosets
.Thanks for expressing the need for this, @shyamupa