wikilinks / neleval

Entity disambiguation evaluation and error analysis tool
Apache License 2.0
116 stars 23 forks source link

Add data validation #6

Closed benhachey closed 9 years ago

benhachey commented 10 years ago

The main goal is to prevent input that will break the evaluation measure implementations here. We could also provide warnings as a convenience to help users ensure annotation/output meets their expectations.

Duplicate mentions

Duplicate mentions cause problems. The tool should print an error and exit if mention/query IDs have the same span.

Nested mentions

Whether nested mentions are desirable depends on the task definition. For instance, they are allowable in TAC14 but not in CoNLL/AIDA. The tool could print a warning as a convenience. Optionally, we could provide a flag to tidy nesting, e.g., by removing inner mentions.

Crossing mentions

Again, the tool could print a warning as a convenience.