stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.72k stars 2.7k forks source link

How to run coreference resolution on custum CoNLL data #939

Closed XDcsy closed 5 years ago

XDcsy commented 5 years ago

I have checked the instructions on replicating the CoNLL 2012 results. However I'm not trying to replicate the results.
Instead, I have some .conll files, and I want to perform coreference resolution on these files. Because the POS, NER, Speaker etc. tags are manually revised in these files, I wish the system can make use of these information.

I tried this command:
java -Xmx6g -cp stanford-corenlp-3.9.2.jar;stanford-corenlp-3.9.2-models.jar;* edu.stanford.nlp.coref.CorefSystem -props edu/stanford/nlp/coref/properties/neural-english-conll.properties -coref.data ./conll-files -coref.conllOutputPath ./results -coref.scorer ./scorers/scorer.bat

./conll-files is where the .conll files are stored

However, I received this output:

[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref model edu/stanford/nlp/models/coref/neural/english-model-conll.ser.gz ... done [0.4 sec].
[main] INFO edu.stanford.nlp.coref.neural.NeuralCorefAlgorithm - Loading coref embeddings edu/stanford/nlp/models/coref/neural/english-embeddings.ser.gz ... done [0.4 sec].
[main] INFO CoreNLP - Identification of Mentions: Recall: (0 / 0) 0%    Precision: (0 / 0) 0%   F1: 0%
Coreference: Recall: (0 / 0) 0% Precision: (0 / 0) 0%   F1: 0%
Coreference: Recall: (0 / 0) 0% Precision: (0 / 0) 0%   F1: 0%
Coreference: Recall: (0 / 0) 0% Precision: (0 / 0) 0%   F1: 0%
Coreference: Recall: (0 / 0) 0% Precision: (0 / 0) 0%   F1: 0%
[main] INFO CoreNLP - Final conll score ((muc+bcub+ceafe)/3) = 0

It seems like the system was not able to load the files? Is there a way to make it work?

J38 commented 5 years ago

You need to match the directory structure of the CoNLL 2012 data set.

You can see what that directory structure is like by going here and downloading the official data:

http://conll.cemantix.org/2012/data.html

The v4 and v9 directories need to be in the conll-2012 directory (this could be called anything really).

For instance /path/to/conll-2012/v4 /path/to/conll-2012/v9

You'll have to study the data they provide to see how to convert your data into something similar.