joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
1.95k stars 262 forks source link

[javasrc2cpg] Running speed #3858

Open Elfe-w opened 9 months ago

Elfe-w commented 9 months ago

Hi, I have a large number of Java files to process. In fact, I ran it for 17 hours and processed about 1300 files. Is this a normal processing speed? Am I using the wrong command? Is there any way to increase the parsing speed? Looking forward to your answer. I am using the following command

joern-parse {java_file_path}
joern --script {script_path} --param cpgFile={cdgFile_path} --param outFile={cdgs_save_path}

The content of the script file is as follows:

@main def exec(cpgFile: String, outFile: String): Unit = {
  //loadCpg(cpgFile)
  importCpg(cpgFile)
  var myList = cpg.method.filter(_.isExternal == false).name.toList 
  for (element <- myList) {
    var file= outFile+element+".dot"
    cpg.method(element).dotCpg14.l#> file
  }
}
DavidBakerEffendi commented 8 months ago

processed about 1300 files

Is this 1300 dot files or only parsed 1300 Java files?

Elfe-w commented 8 months ago

收到。

DavidBakerEffendi commented 8 months ago

So I would imagine the parsing shouldn't take that long, but your dot file generation is in serial. You also do multiple method look-ups which is a bottleneck - cpg.method uses cpg.method.name which is regex-based and cpg.method.nameExact uses an index lookup which is much faster, but in any case you already have all the methods.

You could fix this, be null-safe (with .headOption) and add some simple concurrency with (note, this is untested):

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}

def generateDotFile(outPath: String, method: Method): Future[Unit] = Future {
   method.dotCpg14.headOption.foreach(_ #> outPath)
}

@main def exec(cpgFile: String, outFile: String): Unit = {
  importCpg(cpgFile)
  val methods = cpg.method.isExternal(false).l
  val futures =  for { m <- methods } yield generateDotFile(s"$outFile${java.io.File.separator}${m.name}.dot", m)
  Await.result(Future.sequence(futures), Duration.Inf)
}