nilsreiter / CorefAnnotator

Annotation tool for coreference
Apache License 2.0
31 stars 6 forks source link

Cross-document annotation #128

Open nilsreiter opened 6 years ago

nilsreiter commented 6 years ago

Supporting cross-document annotation would require to bundle documents together to be part of a document group, and to store entities separated of the documents. Documents could be put as separate views into one JCas, but this makes the file sizes even larger (maybe not a problem given the compression).

melandresen commented 6 years ago

This is something that is becoming relevant for us just now. Our project partners want to annotate "Bundestagsdebatten" and of course there are many entities that are referenced across sessions. I guess this is not a feature that will be available shortly. Do you have any experience with entity management across documents? Our best idea so far is to try and standardize names of entities so that a mapping between documents is possible. But as we have many documents and several annotators, I expect this to be rather error-prone.

nilsreiter commented 6 years ago

We haven't really started doing this, so no experiences yet. For real-world entities, it might make sense to use a Wikipedia-URL as entity name (or at least include it).

melandresen commented 6 years ago

We thought about similar possibilities. We had a talk about wikidata some weeks ago. They also provide unique identifiers for (not only) real-world entities. I will discuss this with our social scientists. We can consider putting this on the schedule for our workshop in september.