stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.67k stars 2.7k forks source link

Very slow processing time for sentiment annotation #497

Closed david-ratcliffe closed 7 years ago

david-ratcliffe commented 7 years ago

I am seeing very slow processing times (~3 to 10 seconds or more per string) when using StanfordCoreNLP for sentiment annotation, as was discussed here in 2013.

I am using Maven to retrieve stanford-corenlp version 3.8.0 (with models), and the code I'm using is:

Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,parse,sentiment"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String text = "... very long UTF-8 encoded string containing 1000-2000 chars....."; Annotation annotation = pipeline.process(text); List<String> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class).stream().map(sentence -> sentence.toString() + " [" + Sentiment.parse(sentence.get(SentimentCoreAnnotations.SentimentClass.class)).getRepresentation() + "]").collect(Collectors.toList());

Re-using pipeline.process(text) to process similarly long strings produces similar results. JVisualVM reports that the most time is spent inside these methods:

edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.doInsideScores() edu.stanford.nlp.parser.lexparser.ExhaustivePCFGParser.doInsideChartCell()

Am I using the code properly to perform sentiment annotation?

J38 commented 7 years ago

Can you post the time break down for each annotator? Is most of the time being spent on parsing? You might try using a faster parser:

props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");

david-ratcliffe commented 7 years ago

Thank you J38, I can confirm that using this parser significantly increases processing speed down to 250ms from 3-10 seconds. The line you've provided only worked at run-time after I included the following in my POM file:

        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>3.8.0</version>
            <classifier>models-english</classifier>
        </dependency>

Also, I needed to modify the annotators property, as follows:

props.setProperty("annotators", "tokenize,ssplit,pos,parse,sentiment");

snehabagal commented 6 years ago

Thank you so much J38, worked for me as well. It have considerably reduced processing time.