Closed david-ratcliffe closed 7 years ago
Can you post the time break down for each annotator? Is most of the time being spent on parsing? You might try using a faster parser:
props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
Thank you J38, I can confirm that using this parser significantly increases processing speed down to 250ms from 3-10 seconds. The line you've provided only worked at run-time after I included the following in my POM file:
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.8.0</version>
<classifier>models-english</classifier>
</dependency>
Also, I needed to modify the annotators property, as follows:
props.setProperty("annotators", "tokenize,ssplit,pos,parse,sentiment");
Thank you so much J38, worked for me as well. It have considerably reduced processing time.
I am seeing very slow processing times (~3 to 10 seconds or more per string) when using StanfordCoreNLP for sentiment annotation, as was discussed here in 2013.
I am using Maven to retrieve stanford-corenlp version 3.8.0 (with models), and the code I'm using is:
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit,parse,sentiment");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "... very long UTF-8 encoded string containing 1000-2000 chars.....";
Annotation annotation = pipeline.process(text);
List<String> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class).stream().map(sentence -> sentence.toString() + " [" + Sentiment.parse(sentence.get(SentimentCoreAnnotations.SentimentClass.class)).getRepresentation() + "]").collect(Collectors.toList());
Re-using
pipeline.process(text)
to process similarly long strings produces similar results. JVisualVM reports that the most time is spent inside these methods:edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.doInsideScores()
edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.doInsideChartCell()
Am I using the code properly to perform sentiment annotation?