stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.67k stars 2.7k forks source link

Inifinite recursion issue ... :( #1348

Open alanlit opened 1 year ago

alanlit commented 1 year ago

Running 4.5.2 and the default English model 4.5.2. Every so often I get the following stack-blowing recursion: at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) ......

On either an NER pipe: props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner") props.setProperty("ner.statisticalOnly", "true") props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz") Or a sentiment pipe: props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz") props.setProperty("annotators", "tokenize,ssplit,pos,parse,sentiment")

Sorry I don't know which yet (or have a sample of the text that seems to trigger it .. working on it).

Using openjdk 17.0.2.

Any thoughts as to what might be going on ?

Tnx Alan

AngledLuffa commented 1 year ago

I'm sorry, but this isn't enough information to diagnose the problem. If/when you isolate a passage which is causing difficulties, even an entire document, let us know and we'll take a look.

On Mon, Mar 13, 2023 at 4:29 PM alanlit @.***> wrote:

Running 4.5.2 and the default English model 4.5.2. Every so often I get the following stack-blowing recursion: at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) at edu.stanford.nlp.parser.lexparser.TreeBinarizer.outsideBinarizeLocalTree(TreeBinarizer.java:479) ......

On either an NER pipe: props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner") props.setProperty("ner.statisticalOnly", "true") props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz") Or a sentiment pipe: props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz") props.setProperty("annotators", "tokenize,ssplit,pos,parse,sentiment")

Sorry I don't know which yet (or have a sample of the text that seems to trigger it .. working on it).

Using openjdk 17.0.2.

Any thoughts as to what might be going on ?

Tnx Alan

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1348, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWJ2GAT6G7LEWCEBDYDW36UW3ANCNFSM6AAAAAAVZXBQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

alanlit commented 1 year ago

Quite understand - the problem is it can run for a day or so before blowing -- I'm still trying to isolate a test case, but in the meantime it decided to stack overflow via a different route. No idea if this gives you a clue or not: ava.lang.StackOverflowError at org.ejml.simple.AutomaticSimpleMatrixConvert.specify(AutomaticSimpleMatrixConvert.java:46) at org.ejml.simple.SimpleBase.insertIntoThis(SimpleBase.java:960) at edu.stanford.nlp.neural.NeuralUtils.concatenateWithBias(NeuralUtils.java:282) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:543) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:511) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) at edu.stanford.nlp.sentiment.SentimentCostAndGradient.forwardPropagateTree(SentimentCostAndGradient.java:512) ..... and so on .. :) Tnx Alan

AngledLuffa commented 1 year ago

It certainly looks like an issue with a very deep parse tree, either because the parse was degenerate or because the text was very long and led to a huge parse tree. Are you giving it any text that is unrealistically long?