stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.64k stars 2.7k forks source link

Coreference in CoNLL output #869

Open andreasvc opened 5 years ago

andreasvc commented 5 years ago

I'm trying to run the dcoref system on a plain text file and want to get the output in CoNLL 2012 format.

I've tried several things:

$ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt \
    -coref.conllOutputPath /tmp/example.conll

However, this option is ignored, and I get XML output.

$ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt -outputFormat conll \
    -output.columns doctitle,section,idx,word,lemma,pos,ner,headidx,deprel,link

This option is honored, but "link" does not give coreference information, and I don't see what other column I should use.

There are instructions on running the system on CoNLL 2011 data and evaluating on it, but for this use case, I don't have annotated data.

andreasvc commented 5 years ago

I wrote a conversion script from XML to CoNLL 2012: https://gist.github.com/andreasvc/6bf9e10b2e6956ce32fb777e7efe99cb