hipe-eval / HIPE-scorer

A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).
https://hipe-eval.github.io
MIT License
13 stars 4 forks source link

handling of invalid tags vs out-of-GT tags #9

Closed mromanello closed 4 years ago

mromanello commented 4 years ago

Proposed change to how predicted tags are handled by the scorer.

Current behaviour:

Given a certain TSV column (e.g. NE-COARSE-LIT), a predicted tag is ignored by the scorer (i.e. considered as if it were an O tag) if it's not in the set of tags contained in the ground-truth (GT) for that specific column.

For example, in case a system returns the tag B-PERS for the column NE-FINE-COMP, and the ground-truth does not contain any tag B-PERS for that column, it is currently considered as an O tag, thus resulting in a false positive error.

New behaviour:

For each column, the scorer will accept as a valid tag any tag present in a predefined tagset known to the scorer. The default tagset list will correspond to the set of tags existing in the HIPE train/dev/test corpora, and could be overwritten.

In this case, if a system returns the tag PERS for the column NE-COARSE-METO it will be considered as a valid tag since PERS is present in the tagset (all tags defined in the annotation schema).

NB: this change is likely to have some impact on evaluation of systems (i.e. slightly worse precision scores).