mkarmona commented 2 years ago

CoreNLP version 4.5.0 using pos lemma depparse. I run the pipeline within Spark (Scala). I lazy initialise the CoreNLP pipeline and I broadcast the pipeline to each executor using lazy instantiation wrapped in a case object. Also I force not to split the text fragment as it is intended to be a sentence already. The objective here is to do dependency analysis on the sentence and run some semgraph rules against it. We got a case where it throws an exception like this

Caused by: edu.stanford.nlp.semgraph.UnknownVertexException: Operation attempted on unknown vertex happens/VBZ'''' in graph -> observed/VBD (root)
  -> 24/CD (nsubj)
    -> response/NN (nmod:in)
      -> In/IN (case)
      -> CoV/NNP (nmod:to)
        -> to/IN (case)
        -> SARS/NNP (compound)
        -> ‐/SYM (dep)
        -> ‐/SYM (dep)
        -> peptides/NNS (dep)
          -> 2/CD (nummod)
  -> ,/, (punct)
  -> we/PRP (nsubj)
  -> unexpectedly/RB (advmod)
  -> associated/VBN (ccomp)
    -> that/IN (mark)
    -> sirolimus/NN (nsubj:pass)
    -> was/VBD (aux:pass)
    -> significantly/RB (advmod)
    -> release/NN (obl:with)
      -> with/IN (case)
      -> a/DT (det)
      -> proinflammatory/JJ (amod)
      -> cytokine/NN (compound)
      -> levels/NNS (nmod:including)
        -> including/VBG (case)
        -> higher/JJR (amod)
        -> α/NN (nmod:of)
          -> of/IN (case)
          -> TNF/NN (compound)
          -> ‐/SYM (dep)
          -> IL/NN (conj:and)
            -> and/CC (cc)
        -> IL/NN (nmod:of)
        -> 1β/NN (nmod)
          -> ‐/SYM (dep)
  -> ./. (punct)

    at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.<init>(GraphRelation.java:1084)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.<init>(GraphRelation.java:310)
    at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:339)
    at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.resetChildIter(SemgrexMatcher.java:80)
    at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)
    at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)
    at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:363)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:457)
    at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:574)
    at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193)
    at az.bikg.nlp.etl.common.nlp.Pattern.go$3(Pattern.scala:200)
    at az.bikg.nlp.etl.common.nlp.Pattern.$anonfun$findCauseEffectMatches$6(Pattern.scala:268)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at az.bikg.nlp.etl.common.nlp.Pattern.findCauseEffectMatches(Pattern.scala:266)
    at az.bikg.nlp.etl.steps.ERs$.findRelations(ERs.scala:107)
    at az.bikg.nlp.etl.steps.ERs$.findRelationsSpark(ERs.scala:229)
    at az.bikg.nlp.etl.steps.ERs$.$anonfun$extractERs$1(ERs.scala:242)
    ... 28 more

Am I doing anything wrong because of this exception? It didn't happen with version 4.4.0.

AngledLuffa commented 2 years ago

I have seen something similar before. Is this in a multithreaded environment? What semgrex operation did you run?

On Fri, Aug 12, 2022, 2:23 AM Miguel Carmona @.***> wrote:

CoreNLP version 4.5.0 using pos lemma depparse. I run the pipeline within Spark (Scala). I lazy initialise the CoreNLP pipeline and I broadcast the pipeline to each executor using lazy instantiation wrapped in a case object. Also I force not to split the text fragment as it is intended to be a sentence already. The objective here is to do dependency analysis on the sentence and run some semgraph rules against it. We got a case where it throws an exception like this

Caused by: edu.stanford.nlp.semgraph.UnknownVertexException: Operation attempted on unknown vertex happens/VBZ'''' in graph -> observed/VBD (root)

-> 24/CD (nsubj)
-> response/NN (nmod:in)

  -> In/IN (case)

  -> CoV/NNP (nmod:to)

    -> to/IN (case)

    -> SARS/NNP (compound)

    -> ‐/SYM (dep)

    -> ‐/SYM (dep)

    -> peptides/NNS (dep)

      -> 2/CD (nummod)
-> ,/, (punct)

-> we/PRP (nsubj)

-> unexpectedly/RB (advmod)

-> associated/VBN (ccomp)
-> that/IN (mark)

-> sirolimus/NN (nsubj:pass)

-> was/VBD (aux:pass)

-> significantly/RB (advmod)

-> release/NN (obl:with)

  -> with/IN (case)

  -> a/DT (det)

  -> proinflammatory/JJ (amod)

  -> cytokine/NN (compound)

  -> levels/NNS (nmod:including)

    -> including/VBG (case)

    -> higher/JJR (amod)

    -> α/NN (nmod:of)

      -> of/IN (case)

      -> TNF/NN (compound)

      -> ‐/SYM (dep)

      -> IL/NN (conj:and)

        -> and/CC (cc)

    -> IL/NN (nmod:of)

    -> 1β/NN (nmod)

      -> ‐/SYM (dep)
-> ./. (punct)

at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730)

at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325)

at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103)

at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:1084)

at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.(GraphRelation.java:310)

at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310)

at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:339)

at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.resetChildIter(SemgrexMatcher.java:80)

at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)

at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)

at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.resetChildIter(CoordinationPattern.java:168)

at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:363)

at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:457)

at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:574)

at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193)

at az.bikg.nlp.etl.common.nlp.Pattern.go$3(Pattern.scala:200)

at az.bikg.nlp.etl.common.nlp.Pattern.$anonfun$findCauseEffectMatches$6(Pattern.scala:268)

at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)

at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)

at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)

at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)

at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)

at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)

at az.bikg.nlp.etl.common.nlp.Pattern.findCauseEffectMatches(Pattern.scala:266)

at az.bikg.nlp.etl.steps.ERs$.findRelations(ERs.scala:107)

at az.bikg.nlp.etl.steps.ERs$.findRelationsSpark(ERs.scala:229)

at az.bikg.nlp.etl.steps.ERs$.$anonfun$extractERs$1(ERs.scala:242)

... 28 more

Am I doing anything wrong because of this exception?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWKUEDPBTU3WG7XB4RLVYYJYXANCNFSM56LAXVKQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mkarmona commented 2 years ago

Thanks for coming back to me so quickly. I create a lazy instance construction wrapped of a pipeline per executor in a Spark environment val pipeline = new StanfordCoreNLP(props). That basically means, the object that contains the creation instruction is serialised to each executor and then the instance is created and mantained; this is done because CoreNLP creation is not cheap and it needs the model on mem as you already know.

Then, each executor uses that long-lived instance through that number of cores in that node. What do I do with that pipeline instance? I process a sentence in this way

val res = Try {
  val doc = pipeline.processToCoreDocument(sen)
  val sentence = doc.sentences().get(0)
  val semanticGraph = sentence.dependencyParse()
  val pattern = Pattern(semanticGraph)
...

Where my Pattern class uses the analysed dependency graph against some precompiled patterns. In my mind, this type of concurrency error might come from a pipeline that is not fully immutable or thread-safe.

AngledLuffa commented 2 years ago

I absolutely agree, this is some kind of concurrency bug.

Am I understanding correctly that each executor has its own Pipeline, or are they sharing the Pipelines?

How about the Semgrex operations? Are those patterns precompiled per executor or shared between items?

I'm trying to figure out where to look for the error. I expect it's either in the depparse or semgrex somewhere based on the error

mkarmona commented 2 years ago

This is the full case object I use to serialise the pipeline creation to each executor and I assume each pipeline is immutable as it is created in each executor shared across cores in a node.

case object CoreNLPWrapper {
  def make: StanfordCoreNLP = {
    val defaultAnnotators: List[String] =
      List("tokenize", "ssplit", "pos", "lemma", "depparse", "natlog")

    val props = new Properties()
    props.setProperty(
      "annotators",
      defaultAnnotators.mkString(",")
    )
    props.setProperty("ssplit.isOneSentence", "true")
    val pipeline = new StanfordCoreNLP(props)

    pipeline
  }

  lazy val parser: StanfordCoreNLP = make
}

so when I wrap this object within a broadcast and ask for parser for the first time the pipeline is created in each executor. Then, that variable lives the whole executor job's life.

For the semgrex I reuse the produced output from SemgrexBatchParser across all the access for searching in each of the created pattern objects I pasted before. can I do that or should I generate the batch parser for each semgraph?

AngledLuffa commented 2 years ago

I don't know enough about how broadcast works to know whether that is a new object in each executor or the same one. Honestly, I don't know that system at all. If there's some way to get something I can run which will cause this issue, that would be great.

It's really weird that it didn't happen with 4.4.0 - nothing changed in the dependency parser or in the semgrex which would change that behavior. The only thing I can think of which would affect things downstream would be the tokenizer or lemmatizer changes causing differences in the annotations you're getting back.

One possibility that might help find the error would be to send you a version with more logging to show when & where it crashes, although to be honest right now I don't even know what's causing the problem. Is it possible for you to use a new jar file if we send you one?

AngledLuffa commented 2 years ago

Another thing we could do to try to isolate it is turn off natlog for now. If you can recreate the exception without that annotator, that would be one less place to search

AngledLuffa commented 2 years ago

I wonder if this has something to do with topologicalSortCache in SemgrexMatcher. I don't remember that feature from way back when I first made these things threadsafe, and it looks like it could easily result in an inconsistent state between graphs.

An easy way to test would be to remove it and send you a jar. Is that something you can try?

AngledLuffa commented 2 years ago

(Note: that previous one apparently only applies if you are using aligned graphs. I don't know if any of the graphs used in the system do, though)

AngledLuffa commented 2 years ago

The other instance where this happened recently was on 4.4.0, and used the annotators tokenize,cleanxml,ssplit,pos,lemma,parse, so I do not believe the 4.4.0 -> 4.5.0 differences or the natlog annotator. Although weirdly they were not using Semgrex themselves afaik.

@d0ngw

If I understand the stack trace in this version of the problem, it is here where you are calling our stuff:

at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193)
at az.bikg.nlp.etl.common.nlp.Pattern.go$3(Pattern.scala:200)

So this appears to be after CoreNLP has processed the sentence, at the time of using semgrex. Does that sound correct?

AngledLuffa commented 2 years ago

https://nlp.stanford.edu/software/stanford-corenlp-4.5.0b.zip

I think I fixed this bug... would you give it a try if possible?

1229 might also be fixed with the same bugfix, @d0ngw

1169 too, why not, just fix all the damn bugs

848? i just removed a cache of literally every sentence that ever gets parsed, except it was weak references, so maybe that's not actually relevant

mkarmona commented 2 years ago

Thank you very much. I will also review my side of the code.

AngledLuffa commented 2 years ago

@mkarmona have you had a chance to try out the updated package? If it works, we'll go ahead with a bugfix release sometime in the near future.

mkarmona commented 2 years ago

I am in slowmo holiday. I will give it a go when I have the chance. Sorry

AngledLuffa commented 2 years ago

No worries. I also thought of a test to verify that this was the problem, but it'd be a little annoying to implement, so I was hoping you'd just do it for us :)

mkarmona commented 2 years ago

@AngledLuffa, I have done some work on my side. Both versions, 4.4 and 4.5, suffer the same concurrency problem. Moving the SemgrexPattern compilation in each executor within the lazy object instantiation made the trick on my side so I can safely go through tens of millions of documents (still running) with Spark and CoreNLP happily again. It was also my fault; I shouldn't be doing it that way and assuming thread-safe.

I am afraid I couldn't test your version. The main reasons:

It fails in an unpredictable way (it might take hours)
it fails when I throw it to the cluster. It is on automatic pipelining on dynamic spark cluster pools, so injecting some custom zip in dynamically created clusters sounds a bit overhead on our side.

What can I do to help?

if you populate a minor version on the Maven platform, I can commit a minor internal release. it will be scheduled in the next month's run.

AngledLuffa commented 2 years ago

I can't tell if that evidence helps or hurts my theory that it's the attempted cache of the semgrex graphs causing the problems. It certainly should be thread-safe, and we'll work to make sure it is thread-safe again.

We should be able to get a minor version on Maven in another week or so, if that works with your timeline for "next month's run". There's a couple small tokenizer problems we're going to fix first as well

AngledLuffa commented 2 years ago

Version 4.5.1 is now on Maven. Would you let us know if the crashes go away?

AngledLuffa commented 2 years ago

(@mkarmona)

mkarmona commented 2 years ago

Sure. Asap I come back to business.

AngledLuffa commented 2 years ago

Any luck with the crashes? Hoping they went away with the new version

mkarmona commented 2 years ago

It works in a run where it was failing before. All fully addressed? I cannot assert it.

AngledLuffa commented 2 years ago

Awesome, glad to hear it. I will consider the matter closed unless we hear otherwise

stanfordnlp / CoreNLP

Exception thrown for operation attempted on unknown vertex #1296

1229 might also be fixed with the same bugfix, @d0ngw

1169 too, why not, just fix all the damn bugs

848? i just removed a cache of literally every sentence that ever gets parsed, except it was weak references, so maybe that's not actually relevant

What can I do to help?