joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
2.1k stars 288 forks source link

[Bug] SchemaViolationException frequently #4279

Closed cyw3 closed 7 months ago

cyw3 commented 8 months ago

Describe the bug

Hello, I still encounter this occasional exception frequently. Every time it occurs, the execution result of the tool will lack many Results that should originally exist, and the number of the log [INFO ] Enhancement io.joern.console.scan.ScanPass completed in 4450 ms. 130 + 650 changes committed from 25 parts. changed.

To Reproduce Here is a project that can be reproduced frequently (an average of 1 time in 4 executions), which can be tested based on this project. https://github.com/Haptic-Apps/Slide Steps to reproduce the behavior:

  1. git clone
    git clone https://github.com/Haptic-Apps/Slide
  2. exec cmd
    /data/test/joern-cli/joern-scan Slide/ --names=too-many-params --language=java --overwrite

Expected behavior It is expected that no exception will be reported when running, and the number of execution results will be stable.

Screenshots

Caused by: overflowdb.SchemaViolationException: OUT edge with label AST to an adjacent METHOD_RETURN is mandatory, but not defined for this METHOD node with id=134
        at io.shiftleft.codepropertygraph.generated.nodes.MethodDb.methodReturn(Method.scala:660)
        at io.shiftleft.codepropertygraph.generated.nodes.Method.methodReturn(Method.scala:286)
        at io.joern.x2cpg.passes.controlflow.cfgcreation.CfgCreator.<init>(CfgCreator.scala:55)
        at io.joern.x2cpg.passes.controlflow.CfgCreationPass.runOnPart(CfgCreationPass.scala:23)
        at io.joern.x2cpg.passes.controlflow.CfgCreationPass.runOnPart(CfgCreationPass.scala:17)
        at io.shiftleft.passes.ConcurrentWriterCpgPass.$anonfun$createApplySerializeAndStore$2(ParallelCpgPass.scala:252)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:678)
        at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:467)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.util.NoSuchElementException: next on empty iterator
        at scala.collection.Iterator$$anon$19.next(Iterator.scala:966)
        at scala.collection.Iterator$$anon$19.next(Iterator.scala:964)
        at scala.collection.Iterator$$anon$7.next(Iterator.scala:528)
        at overflowdb.traversal.Traversal.next(Traversal.scala:22)
        at io.shiftleft.codepropertygraph.generated.nodes.MethodDb.methodReturn(Method.scala:655)
        ... 13 more

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

max-leuthaeuser commented 8 months ago

Any reason why you are using that specific (quite old) joern version?

Could you try the latest joern and jdk 19?

cyw3 commented 8 months ago

Any reason why you are using that specific (quite old) joern version?

Could you try the latest joern and jdk 19?

I used the latest versions of joern v2.0.285 and jdk19 for testing, but the problem still exists. And there are many java.lang.ArrayIndexOutOfBoundsException and java.lang.NullPointerException exceptions.

max-leuthaeuser commented 8 months ago

Maybe @johannescoetzee can have a look.

johannescoetzee commented 8 months ago

@cyw3 Thanks for the report! I've managed to reproduce the issue and am looking into it now.

johannescoetzee commented 8 months ago

It looks like this might be a joern-scan bug instead of a javasrc2cpg one. As a test, I ran

./joern-scan --names=too-many-params --language=java ~/path/to/Slide --store --overwrite

which did throw all the errors mentioned above. I then opened the cpg in joern with

joern> open("Slide")
val res4: Option[io.joern.console.workspacehandling.Project] = Some(
  value = Project(
    projectFile = ProjectFile(inputPath = "/path/to/Slide", name = "Slide"),
    path = /path/to/joern/workspace/Slide,
    cpg = Some(value = Cpg (Graph [442599 nodes]))
  )

followed by

joern> import io.joern.scanners.c.Metrics

joern> import io.joern.console.scan.QueryWrapper

joern> val query = Metrics.tooManyParameters(4)
val query: io.joern.console.Query = Query(
  name = "too-many-params",
  author = "@fabsx00",
  title = "Number of parameters larger than 4",
  description = "This query identifies functions with more than 4 formal parameters",
  score = 1.0,
  traversal = io.joern.scanners.c.Metrics$$$Lambda/0x00007fa7688293f0@76e463d3,
  traversalAsString = """{ cpg =>
        cpg.method.internal.filter(_.parameter.size > n).nameNot("<global>")
      }""",
  tags = List("metrics"),
  language = "",
  codeExamples = CodeExamples(
    positive = List(
      """

int too_many_params(int a, int b, int c, int d, int e) {

}

"""
    ),
    negative = List(
      """

void good(int a, int b, int c, int d) {

}

"""
    )
  ),
  multiFileCodeExamples = MultiFileCodeExamples(positive = List(), negative = List())
)

joern> new QueryWrapper(query).apply(cpg).size
val res5: Int = 329

without throwing any errors. Thinking this might have just been a lookup from the pre-applied scan overlay, I tried the query on a fresh Slide cpg, but with the same result.

johannescoetzee commented 8 months ago

I'm also getting, for example,

overflowdb.SchemaViolationException: OUT edge with label AST to an adjacent METHOD_RETURN is mandatory, but not defined for this METHOD node with id=125134

but

joern> cpg.method.id(125134L).methodReturn.l
val res1: List[io.shiftleft.codepropertygraph.generated.nodes.MethodReturn] = List(
  MethodReturn(
    id = 125342L,
    code = "RET",
    columnNumber = Some(value = 24),
    dynamicTypeHintFullName = ArraySeq(),
    evaluationStrategy = "BY_VALUE",
    lineNumber = Some(value = 352),
    order = 7,
    possibleTypes = ArraySeq(),
    typeFullName = "void"
  )
)

and generally

joern> cpg.method.filter(method => Try(method.methodReturn).isFailure).l
val res2: List[io.shiftleft.codepropertygraph.generated.nodes.Method] = List()
johannescoetzee commented 8 months ago

@fabsx00 do you have any ideas for what could be going on here?

johannescoetzee commented 8 months ago

This issues isn't limited to this specific project or query either. I got similar errors for the shiftleft-java-demo project and the too-high-complexity query

max-leuthaeuser commented 8 months ago

I guess the issue should be in the ScanPass executing the query then. Also the log says: [INFO ] Enhancement io.joern.console.scan.ScanPass completed in 4450 ms. 130 + 650 changes committed from 25 parts.

ScanPass is calling addNode here: https://github.com/joernio/joern/blob/4314950797f3e8ca774b7b4fc92c1e3ab6748399/console/src/main/scala/io/joern/console/scan/ScanPass.scala#L12. For sure that will result in a broken CPG.

LPirch commented 7 months ago

Hello, I would like to report that I also encounter the issue of SchemaViolationExceptions but in a different context. It also occurs when I calculate Dataflows in C/C++. The program doesn't crash but I receive inconsistent flows. My query roughly looks like this:

implicit val resolver = NoResolve
val sources = cpg.call.where(_.callee.isExternal)
val sinks = cpg.call.where(_.callee.isExternal)
val flows = sinks.reachableByFlows(sources).l

This is my environment:

I receive the following stack traces:

java.util.concurrent.ExecutionException: overflowdb.SchemaViolationException: IN edge with label AST to an adjacent METHOD is mandatory, but not defined for this METHOD_PARAMETER_IN node with id=11362
        at java.base/java.util.concurrent.ForkJoinTask.reportExecutionException(ForkJoinTask.java:581)
        at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:993)
        at io.joern.dataflowengineoss.queryengine.Engine.runUntilAllTasksAreSolved$1$$anonfun$1(Engine.scala:99)
        at scala.util.Try$.apply(Try.scala:210)
        at io.joern.dataflowengineoss.queryengine.Engine.runUntilAllTasksAreSolved$1(Engine.scala:100)
        at io.joern.dataflowengineoss.queryengine.Engine.solveTasks(Engine.scala:114)
        at io.joern.dataflowengineoss.queryengine.Engine.backwards(Engine.scala:53)
        at io.joern.dataflowengineoss.language.ExtendedCfgNode$.io$joern$dataflowengineoss$language$ExtendedCfgNode$$$reachableByInternal$extension(ExtendedCfgNode.scala:81)
        at io.joern.dataflowengineoss.language.ExtendedCfgNode$.reachableByFlows$extension(ExtendedCfgNode.scala:45)
        at de.mlsec.ivan.strucky.DataFlow$.flowThrough(DataFlow.scala:57)
        at de.mlsec.ivan.strucky.Main$.main(Main.scala:62)
        at de.mlsec.ivan.strucky.Main.main(Main.scala)
Caused by: overflowdb.SchemaViolationException: IN edge with label AST to an adjacent METHOD is mandatory, but not defined for this METHOD_PARAMETER_IN node with id=11362
        at io.shiftleft.codepropertygraph.generated.nodes.MethodParameterInDb.method(MethodParameterIn.scala:437)
        at io.shiftleft.codepropertygraph.generated.nodes.MethodParameterIn.method(MethodParameterIn.scala:233)
        at io.joern.dataflowengineoss.queryengine.TaskCreator.paramToArgsOfCallers(TaskCreator.scala:92)
        at io.joern.dataflowengineoss.queryengine.TaskCreator.paramToArgs(TaskCreator.scala:80)
        at io.joern.dataflowengineoss.queryengine.TaskCreator.tasksForParams$$anonfun$1(TaskCreator.scala:64)
        at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:118)
        at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:105)
        at scala.collection.immutable.Vector.flatMap(Vector.scala:113)
        at io.joern.dataflowengineoss.queryengine.TaskCreator.tasksForParams(TaskCreator.scala:67)
        at io.joern.dataflowengineoss.queryengine.TaskCreator.createFromResults(TaskCreator.scala:27)
        at io.joern.dataflowengineoss.queryengine.TaskSolver.call(TaskSolver.scala:43)
        at io.joern.dataflowengineoss.queryengine.TaskSolver.call(TaskSolver.scala:30)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1456)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
        at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:667)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.run(ForkJoinTask.java:1464)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1423)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1311)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1841)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1806)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

My current workaround is to disable parallel execution with JAVA_OPTS=-XX:ActiveProcessorCount=1 and parallelize my code on a different level but I thought it would be good to fix this issue anyway. As Fabian said in a similar issue, inconsistent results are a no-go.