wikilinks / neleval

Entity disambiguation evaluation and error analysis tool
Apache License 2.0
116 stars 23 forks source link

Add data validation #6

Closed benhachey closed 10 years ago

benhachey commented 10 years ago

The main goal is to prevent input that will break the evaluation measure implementations here. We could also provide warnings as a convenience to help users ensure annotation/output meets their expectations.

Duplicate mentions

Duplicate mentions cause problems. The tool should print an error and exit if mention/query IDs have the same span.

Nested mentions

Whether nested mentions are desirable depends on the task definition. For instance, they are allowable in TAC14 but not in CoNLL/AIDA. The tool could print a warning as a convenience. Optionally, we could provide a flag to tidy nesting, e.g., by removing inner mentions.

Crossing mentions

Again, the tool could print a warning as a convenience.