stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
GNU General Public License v3.0
9.64k stars 2.7k forks source link

Coreference in CoNLL output #869

Open andreasvc opened 5 years ago

andreasvc commented 5 years ago

I'm trying to run the dcoref system on a plain text file and want to get the output in CoNLL 2012 format.

I've tried several things:

$ ./ -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt \
    -coref.conllOutputPath /tmp/example.conll

However, this option is ignored, and I get XML output.

$ ./ -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt -outputFormat conll \
    -output.columns doctitle,section,idx,word,lemma,pos,ner,headidx,deprel,link

This option is honored, but "link" does not give coreference information, and I don't see what other column I should use.

There are instructions on running the system on CoNLL 2011 data and evaluating on it, but for this use case, I don't have annotated data.

andreasvc commented 5 years ago

I wrote a conversion script from XML to CoNLL 2012: