stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.63k stars 2.7k forks source link

Questions about Chinese Coreference resolution #862

Open Hacken-L opened 5 years ago

Hacken-L commented 5 years ago
  1. In src/edu/stanford/nlp, there are two folders, /coref and /dcoref, what are the differences between them?
  2. In src/edu/stanford/nlp/coref, there are src/edu/stanford/nlp/coref/hybrid, src/edu/stanford/nlp/coref/statistical and src/edu/stanford/nlp/coref/neural, does it means the three ways in the homepage of Coreference Resolution to do coreference resolution? What is the difference between src/edu/stanford/nlp/coref/hybrid and the src/edu/stanford/nlp/dcoref ?
  3. In src/edu/stanford/nlp/coref/properties, I did not find statistical properties in Chinese. There are only deterministic and neural properties in Chinese. Does this mean statistical algorithm has not been applied in Chinese?
  4. I have put my own properties file into src/edu/nlp/pipeline and then used ant to build. Then if I use the command in Java: String[] args = new String[] {"-props", "myself.properties" };
    Properties props = StringUtils.argsToProperties(args);

It will get an error: argsToProperties could not read properties file: myself.properties But if I change those into String[] args = new String[] {"-props", "edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties" };
It will be fine. Why does it happens? StanfordCoreNLP-chinese.properties and myself.properties are both properties files in src/edu/nlp/pipeline.

Hacken-L commented 5 years ago

Also, when I use properties files in src/edu/stanford/nlp/coref/properties, for example String[] args = new String[] {"-props", "edu/stanford/nlp/coref/properties/deterministic-chinese.properties" }; or neural-chinese.properties, it seems that coreference chains cannot be established. Only StanfordCoreNLP-chinese.properties in edu/stanford/nlp/pipeline works well. To find the reasons, I tried to print out the tokenize result but it seems that it cannot make right tokens at first step. Why does it happens?

J38 commented 5 years ago

There is a general overview of coreference here: https://stanfordnlp.github.io/CoreNLP/coref.html

  1. The coref folder contains code for the more recent coref algorithms. The dcoref folder contains code for the old deterministic coref system.

  2. hybrid is deprecated for English, the hybrid code for Chinese is actually the deterministic system (rules)

  3. there is not a statistical system for Chinese, just neural and deterministic (it would probably make sense to change the name at some point)

  4. if you want to use your own custom properties, file, I would just provide an absolute path to the file...if you put it in src and it's not working , I think that means it's not being copied anywhere on the CLASSPATH after you build...if you put custom properties files anywhere you like and then provide the absolute path it will work