Closed denzilc closed 7 years ago
@denzilc I have asked the same question. It seems that the default sentiment annotator does not return this kind of info, only a sentiment
and a sentimentValue
keys for each sentence like here:
{
sentimentValue: "2",
sentiment: "Neutral"
}
while you want the score distribution values as asked here that should be available with the edu.stanford.nlp.pipeline.JSONOutputter
and the edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
but I didn't tried already.
Are there any plans to include this information to the JSON output of CoreNLP server some time?
I'd love to have the PROBABILITIES
tree from SentimentPipeline
to be encoded as JSON and returned by the server for the sentiment
annotator.
So currently you can get the distribution of label scores for the whole sentence. I am going to add the sentiment tree to the json output (it is available in the "text" output already), and I'll add the probability of the prediction at each node. The tree is just going to be a string representation though.
String representation is fine for now, thank you. Are you aware of any parsers for these sentiment tree strings?
This is an example of the output (now available with current GitHub code):
(ROOT|sentiment=1|prob=0.715 (NP|sentiment=2|prob=0.988 (DT|sentiment=2|prob=0.998 This) (NN|sentiment=2|prob=0.998 movie)) (@S|sentiment=1|prob=0.797 (VP|sentiment=1|prob=0.730 (@VP|sentiment=1|prob=0.932 (VBZ|sentiment=2|prob=0.997 does) (RB|sentiment=2|prob=0.994 n't)) (VP|sentiment=3|prob=0.504 (VB|sentiment=3|prob=0.962 care) (PP|sentiment=3|prob=0.727 (IN|sentiment=2|prob=0.991 about) (NP|sentiment=3|prob=0.750 (@NP|sentiment=3|prob=0.700 (@NP|sentiment=3|prob=0.798 (@NP|sentiment=3|prob=0.602 (NP|sentiment=3|prob=0.805 cleverness) (,|sentiment=2|prob=0.997 ,)) (NP|sentiment=2|prob=0.986 wit)) (CC|sentiment=2|prob=0.991 or)) (NP|sentiment=3|prob=0.616 (NP|sentiment=2|prob=0.963 (DT|sentiment=2|prob=0.995 any) (@NP|sentiment=2|prob=0.980 (JJ|sentiment=2|prob=0.998 other) (NN|sentiment=3|prob=0.983 kind))) (PP|sentiment=3|prob=0.541 (IN|sentiment=2|prob=0.993 of) (NP|sentiment=3|prob=0.744 (JJ|sentiment=3|prob=0.943 intelligent) (NN|sentiment=4|prob=0.845 humor)))))))) (.|sentiment=2|prob=0.997 .)))
That should be available in the json output with the sentimentTree
key.
You can generate Stanford CoreNLP Tree objects with string representations. If you take that string and make it into a Tree, I believe the labels of the nodes will be of the form "label|sentiment=Sentiment|prob=Sentiment Probability", except for the leaves which will just have the word value for the label.
Consider this code example (based on main() in https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/trees/PennTreeReader.java):
TreeFactory tf = new LabeledScoredTreeFactory();
Reader r = new StringReader("string representation of tree");
TreeReader tr = new PennTreeReader(r, tf);
Tree t = tr.readTree();
while (t != null) {
System.out.println(t);
System.out.println();
t = tr.readTree();
}
r.close();
Here is a snippet going through the children of the root node and printing out their labels. There is a lot of code in Stanford CoreNLP for iterating through trees:
for (Tree subTree : t.children()) {
System.err.println(subTree.label());
}
Note the label will be of this form: NP|sentiment=2|prob=0.988
If anyone thinks I should alter this output, please let me know I am open to changing it.
This is currently the full sentiment output in the json:
"sentimentValue": "1",
"sentiment": "Negative",
"sentimentDistribution": [
0.16713578785867,
0.71513114699161,
0.09327640121561,
0.0167989726291,
0.00765769130501
],
"sentimentTree": "(ROOT|sentiment=1|prob=0.715\n (NP|sentiment=2|prob=0.988 (DT|sentiment=2|prob=0.998 This) (NN|sentiment=2|prob=0.998 movie))\n (@S|sentiment=1|prob=0.797\n (VP|sentiment=1|prob=0.730\n (@VP|sentiment=1|prob=0.932 (VBZ|sentiment=2|prob=0.997 does) (RB|sentiment=2|prob=0.994 n't))\n (VP|sentiment=3|prob=0.504 (VB|sentiment=3|prob=0.962 care)\n (PP|sentiment=3|prob=0.727 (IN|sentiment=2|prob=0.991 about)\n (NP|sentiment=3|prob=0.750\n (@NP|sentiment=3|prob=0.700\n (@NP|sentiment=3|prob=0.798\n (@NP|sentiment=3|prob=0.602 (NP|sentiment=3|prob=0.805 cleverness) (,|sentiment=2|prob=0.997 ,))\n (NP|sentiment=2|prob=0.986 wit))\n (CC|sentiment=2|prob=0.991 or))\n (NP|sentiment=3|prob=0.616\n (NP|sentiment=2|prob=0.963 (DT|sentiment=2|prob=0.995 any)\n (@NP|sentiment=2|prob=0.980 (JJ|sentiment=2|prob=0.998 other) (NN|sentiment=3|prob=0.983 kind)))\n (PP|sentiment=3|prob=0.541 (IN|sentiment=2|prob=0.993 of)\n (NP|sentiment=3|prob=0.744 (JJ|sentiment=3|prob=0.943 intelligent) (NN|sentiment=4|prob=0.845 humor))))))))\n (.|sentiment=2|prob=0.997 .)))\n"
Wow, that was fast! Thanks man ☺️
@J38 that was great 💯
@J38 A question. Would it be possibile to have the string representing the sentimentTree
as the JSON structure previously showed here? This would help to avoid to perform additional parsing (eventually wrong) and the api output would be more "standard".
Thanks.
Sorry to be a bother. I was just wondering if this output is available from the command line method from running CoreNLP or if it requires the standard java api.
Edit: Sorry, I didn't realize my version of CoreNLP was out of date. I've upgraded and now this works very well.
@J38 I have just updated CoreNlp to version 3.9.1 and I noticed a wrong format for the numbers in the sentiment distribution and in the sentiment tree.
"sentimentDistribution": [
0,10110301471115,
0,62425688378713,
0,22854585072979,
0,03641152898384,
0,00968272178809
]
As you can see decimal numbers are represented with a comma instead of a point. The same is happening inside the probs of the sentiment tree.
This could be caused by a locale format inside my laptop (ITALIAN). Do you know how could I avoid this problem? Where in the code the StringWriter could pick up this option?
Thank you
@J38 I have just updated CoreNlp to version 3.9.1 and I noticed a wrong format for the numbers in the sentiment distribution and in the sentiment tree.
"sentimentDistribution": [ 0,10110301471115, 0,62425688378713, 0,22854585072979, 0,03641152898384, 0,00968272178809 ]
As you can see decimal numbers are represented with a comma instead of a point. The same is happening inside the probs of the sentiment tree.
This could be caused by a locale format inside my laptop (ITALIAN). Do you know how could I avoid this problem? Where in the code the StringWriter could pick up this option?
Thank you
I am having the same problem. How could you finally solve this?
@balbinavr this is due to the locale settings when you startup Java
not to CoreNLP
. To solve it it is easier as doing "-Duser.language=en -Duser.country=US Default"
The Stanford Core NLP online demo gives a very nice visualization of phrase based sentiment. Every phrase in the parse tree has a sentiment. For their standard test example
You can see that the phrase
doesn't care about cleverness, wit or any other kind of intelligent humor
is marked as negative (77%) while the phrasecleverness, wit or any other kind of intelligent humor
is marked as positive (76%). You can also get this information in JSON format from the website. I haven't found out a way to get these from the API. Could I get such phrase fine grained sentiments via Stanford Core NLP?Currently, I am using the Stanford Core NLP server with the following properties