Closed GeorgeS2019 closed 6 months ago
https://github.com/stanfordnlp/CoreNLP/issues/1227
I know it is not part of the scope. It would be great if you could get the German language using e.g. the following example.
public class TestSatzErkennung
{
public static String text = "Marie was born in Paris. Marie wurde in Paris geboren.";
public static void main(String[] args)
{
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
// props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");//"tokenize,ssplit,pos,lemma");
// props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/german-ud.tagger");
// props.setProperty("tokenize.language", "German");
// props.setProperty("ner.model", "edu/stanford/nlp/models/ner/german.distsim.crf.ser.gz");
props.setProperty("annotators" ," tokenize, ssplit, mwt, pos, ner, depparse");
props.setProperty("tokenize.language" , "de");
props.setProperty("tokenize.postProcessor" , "edu.stanford.nlp.international.german.process.GermanTokenizerPostProcessor");
props.setProperty("mwt.mappingFile" , "edu/stanford/nlp/models/mwt/german/german-mwt.tsv");
props.setProperty("pos.model" , "edu/stanford/nlp/models/pos-tagger/german-ud.tagger");
props.setProperty("ner.model" , "edu/stanford/nlp/models/ner/german.distsim.crf.ser.gz");
props.setProperty("ner.applyNumericClassifiers" , "false");
props.setProperty("ner.applyFineGrained" , "false");
props.setProperty("ner.useSUTime" , "false");
props.setProperty("parse.model" , "edu/stanford/nlp/models/srparser/germanSR.beam.ser.gz");
props.setProperty("depparse.model" , "edu/stanford/nlp/models/parser/nndep/UD_German.gz");
// build pipeline
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// create a document object
CoreDocument document = pipeline.processToCoreDocument(text);
for(CoreSentence sentence : document.sentences())
{
System.out.println(sentence);
// display tokens
for (CoreLabel tok : sentence.tokens())
{
System.out.println(String.format("%s\t%s\t%s\t%s\t%b", tok.word(), tok.lemma(), tok.tag(), tok.ner(), tok.isMWT()));
}
for(SemanticGraphEdge s : sentence.dependencyParse().edgeIterable())
{
System.out.println(s);
}
}
}
}
I am happy to merge test that check that German works as expected, especially if you have working sample.
@sergey-tihon Good to hear that. Searching the internet, most users complained about the German language (especially the dependency parsing, which is the most critical as OpenNLP has no such features), most likely the least tested, so it is good we have you a second look :-)
This is solved now
Currently going through Parser using model 4.5.1 version provided for German
Potentially relevant issue: No head rule defined for IP using class edu.stanford.nlp.trees.SemanticHeadFinder