plume-oss / plume

Plume is a code representation benchmarking library with options to extract the AST from Java bytecode and store the result in various graph databases.
https://plume-oss.github.io/plume-docs/
Apache License 2.0
70 stars 9 forks source link

REACHING_DEF exception #59

Closed itsacoderepo closed 3 years ago

itsacoderepo commented 3 years ago

Importing commons-io-2.5.zip (replace .zip with .jar) works fine. However, as soon as i run "run.ossdataflow" i see the following exception:

joern> run.ossdataflow 
Exception in thread "Writer" java.lang.RuntimeException: Edge of type REACHING_DEF with direction OUT not supported by class TypeRefDb
    at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:621)
    at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:602)
    at overflowdb.NodeDb.addEdge(NodeDb.java:298)
    at overflowdb.NodeRef.addEdge(NodeRef.java:151)
    at overflowdb.SemiEdge.$minus$minus$greater(SyntacticSugar.scala:59)
    at io.shiftleft.passes.DiffGraph$Applier.odbAddEdge(DiffGraph.scala:388)
    at io.shiftleft.passes.DiffGraph$Applier.addEdge(DiffGraph.scala:380)
    at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1(DiffGraph.scala:332)
    at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1$adapted(DiffGraph.scala:328)
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1279)
    at io.shiftleft.passes.DiffGraph$Applier.run(DiffGraph.scala:328)
    at io.shiftleft.passes.DiffGraph$Applier$.applyDiff(DiffGraph.scala:417)
    at io.shiftleft.passes.ParallelCpgPass$Writer.run(ParallelCpgPass.scala:105)
    at java.base/java.lang.Thread.run(Thread.java:834)
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit
res10: Cpg = io.shiftleft.codepropertygraph.Cpg@2eb76c8c
fabsx00 commented 3 years ago

See https://github.com/ShiftLeftSecurity/joern/pull/458 - looking into reaching-def is an item we've delayed for now but it's on the list.

popthink commented 3 years ago

ReachingDefinitionAnalysis(Pass) might work well.

but plume doesn't make nodes which is in the cpg schema.

I think that is the reason ossdataflow occurs exception.

According to logs, plume may need to create nodes for 'TYPE', '(default)NAMESPACE_BLOCK' and remove edge for 'SOURCE_FILE' deprecated. for compatibility 'SOURCE_FILE' will be created by Enhancer of joern.

popthink commented 3 years ago

At least, I patched the code for ossdataflow. But "reachableBy" doesn't work properly. I think that it needs more nodes and edges. Extractor.kt

/*
 * Copyright 2020 David Baker Effendi
 * <p>
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * <p>
 * http://www.apache.org/licenses/LICENSE-2.0
 * <p>
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package io.github.plume.oss

import io.github.plume.oss.domain.exceptions.PlumeCompileException
import io.github.plume.oss.domain.files.*
import io.github.plume.oss.domain.mappers.VertexMapper.mapToVertex
import io.github.plume.oss.drivers.*
import io.github.plume.oss.graph.ASTBuilder
import io.github.plume.oss.graph.CFGBuilder
import io.github.plume.oss.graph.CallGraphBuilder
import io.github.plume.oss.graph.PDGBuilder
import io.github.plume.oss.options.ExtractorOptions
import io.github.plume.oss.util.DiffGraphUtil
import io.github.plume.oss.util.ExtractorConst.LANGUAGE_FRONTEND
import io.github.plume.oss.util.ExtractorConst.plumeVersion
import io.github.plume.oss.util.ResourceCompilationUtil.COMP_DIR
import io.github.plume.oss.util.ResourceCompilationUtil.compileJavaFiles
import io.github.plume.oss.util.ResourceCompilationUtil.moveClassFiles
import io.github.plume.oss.util.SootParserUtil.determineModifiers
import io.github.plume.oss.util.SootToPlumeUtil
import io.github.plume.oss.util.SootToPlumeUtil.buildClassStructure
import io.github.plume.oss.util.SootToPlumeUtil.buildType
import io.github.plume.oss.util.SootToPlumeUtil.buildTypeDeclaration
import io.github.plume.oss.util.SootToPlumeUtil.obtainModifiersFromTypeDeclVert
import io.shiftleft.codepropertygraph.Cpg
import io.shiftleft.codepropertygraph.generated.EdgeTypes.AST
import io.shiftleft.codepropertygraph.generated.EdgeTypes.SOURCE_FILE
import io.shiftleft.codepropertygraph.generated.NodeKeyNames.*
import io.shiftleft.codepropertygraph.generated.NodeTypes.*
import io.shiftleft.codepropertygraph.generated.nodes.*
import io.shiftleft.dataflowengineoss.passes.reachingdef.ReachingDefPass
import io.shiftleft.semanticcpg.passes.FileCreationPass
import io.shiftleft.semanticcpg.passes.languagespecific.fuzzyc.TypeDeclStubCreator
import io.shiftleft.semanticcpg.passes.linking.linker.Linker
import io.shiftleft.semanticcpg.passes.trim.TrimPass
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.Logger
import overflowdb.Graph
import overflowdb.Node
import scala.Option
import scala.jdk.CollectionConverters
import soot.*
import soot.jimple.*
import soot.jimple.spark.SparkTransformer
import soot.jimple.toolkits.callgraph.CHATransformer
import soot.jimple.toolkits.callgraph.Edge
import soot.options.Options
import soot.toolkits.graph.BriefUnitGraph
import java.io.File
import java.io.IOException
import java.nio.file.Files
import java.nio.file.Path
import java.nio.file.Paths
import java.util.stream.Collectors
import java.util.zip.ZipFile
import kotlin.streams.asSequence
import kotlin.streams.toList
import io.shiftleft.codepropertygraph.generated.nodes.File as ODBFile

/**
 * The main entrypoint of the extractor from which the CPG will be created.
 *
 * @param driver the [IDriver] with which the graph will be constructed with.
 */
class Extractor(val driver: IDriver) {
    private val logger: Logger = LogManager.getLogger(Extractor::javaClass)

    private val loadedFiles: HashSet<PlumeFile> = HashSet()
    private val astBuilder: ASTBuilder
    private val cfgBuilder: CFGBuilder
    private val pdgBuilder: PDGBuilder
    private val callGraphBuilder: CallGraphBuilder
    private lateinit var programStructure: Graph

    init {
        File(COMP_DIR).let { f -> if (f.exists()) f.deleteRecursively(); f.deleteOnExit() }
        checkDriverConnection(driver)
        astBuilder = ASTBuilder(driver)
        cfgBuilder = CFGBuilder(driver)
        pdgBuilder = PDGBuilder(driver)
        callGraphBuilder = CallGraphBuilder(driver)
    }

    /**
     * The companion object of this class holds the state of the current extraction
     */
    companion object {
        private val sootToPlume = mutableMapOf<Any, MutableList<NewNodeBuilder>>()
        private val classToFileHash = mutableMapOf<SootClass, String>()
        private val savedCallGraphEdges = mutableMapOf<String, MutableList<NewCallBuilder>>()

        /**
         * Associates the given Soot object to the given [NewNode].
         *
         * @param sootObject The object from a Soot [BriefUnitGraph] to associate from.
         * @param node The [NewNode] to associate to.
         * @param index The index to place the associated [NewNode] at.
         */
        fun addSootToPlumeAssociation(sootObject: Any, node: NewNodeBuilder, index: Int = -1) {
            if (!sootToPlume.containsKey(sootObject)) sootToPlume[sootObject] = mutableListOf(node)
            else if (index <= -1) sootToPlume[sootObject]?.add(node)
            else sootToPlume[sootObject]?.add(index, node)
        }

        /**
         * Associates the given Soot object to the given list of [NewNode]s.
         *
         * @param sootObject The object from a Soot [BriefUnitGraph] to associate from.
         * @param nodes The list of [NewNode]s to associate to.
         * @param index The index to place the associated [PlumeVertex](s) at.
         */
        fun addSootToPlumeAssociation(sootObject: Any, nodes: MutableList<NewNodeBuilder>, index: Int = -1) {
            if (!sootToPlume.containsKey(sootObject)) sootToPlume[sootObject] = nodes
            else if (index <= -1) sootToPlume[sootObject]?.addAll(nodes)
            else sootToPlume[sootObject]?.addAll(index, nodes)
        }

        /**
         * Retrieves the list of [NewNode] associations to the given Soot object.
         *
         * @param sootObject The object from a Soot [BriefUnitGraph] to get associations from.
         */
        fun getSootAssociation(sootObject: Any): List<NewNodeBuilder>? = sootToPlume[sootObject]

        /**
         * Associates the given [SootClass] with its source file's hash.
         *
         * @param cls The [SootClass] to associate.
         * @param hash The hash for the file's contents.
         */
        fun putNewFileHashPair(cls: SootClass, hash: String) {
            classToFileHash[cls] = hash
        }

        /**
         * Retrieves the original file's hash from the given [SootClass].
         *
         * @param cls The representative [SootClass].
         */
        fun getFileHashPair(cls: SootClass) = classToFileHash[cls]

        /**
         * Saves call graph edges to the [NewMethod] from the [NewCall].
         *
         * @param fullName The method full name.
         * @param call The source [NewCall].
         */
        fun saveCallGraphEdge(fullName: String, call: NewCallBuilder) {
            if (!savedCallGraphEdges.containsKey(fullName)) savedCallGraphEdges[fullName] = mutableListOf(call)
            else savedCallGraphEdges[fullName]?.add(call)
        }

        /**
         * Retrieves all the incoming [NewCall]s from the given [NewMethod].
         *
         * @param fullName The method full name.
         */
        fun getIncomingCallGraphEdges(fullName: String) = savedCallGraphEdges[fullName]
    }

    /**
     * Make sure that all drivers that require a connection are connected.
     *
     * @param driver The driver to check the connection of.
     */
    private fun checkDriverConnection(driver: IDriver) {
        when (driver) {
            is GremlinDriver -> if (!driver.connected) driver.connect()
            is OverflowDbDriver -> if (!driver.connected) driver.connect()
            is Neo4jDriver -> if (!driver.connected) driver.connect()
        }
    }

    /**
     * Loads a single Java class file or directory of class files into the cannon.
     *
     * @param f The Java source/class file, or a directory of source/class files.
     * @throws PlumeCompileException If no suitable Java compiler is found given .java files.
     * @throws NullPointerException If the file does not exist.
     * @throws IOException This would throw if given .java files which fail to compile.
     */
    @Throws(PlumeCompileException::class, NullPointerException::class, IOException::class)
    fun load(f: File): Extractor {
        File(COMP_DIR).let { c -> if (!c.exists()) c.mkdirs() }
        if (!f.exists()) {
            throw NullPointerException("File '${f.name}' does not exist!")
        } else if (f.isDirectory) {
            Files.walk(Paths.get(f.absolutePath)).use { walk ->
                walk.map { obj: Path -> obj.toString() }
                    .map { FileFactory.invoke(it) }
                    .filter { it !is UnsupportedFile }
                    .collect(Collectors.toList())
                    .let { loadedFiles.addAll(it) }
            }
        } else if (f.isFile) {
            if (f.name.endsWith(".jar")) {
                unzipArchive(ZipFile(f)).forEach { loadedFiles.add(FileFactory(it)) }
            } else {
                loadedFiles.add(FileFactory(f))
            }
        }
        return this
    }

    private fun unzipArchive(zf: ZipFile) = sequence {
        zf.use { zip ->
            // Copy zipped files across
            zip.entries().asSequence().filter { !it.isDirectory }.forEach { entry ->
                val destFile = File(COMP_DIR + File.separator + entry.name)
                val dirName = destFile.absolutePath.substringBeforeLast('/')
                // Create directory path
                File(dirName).mkdirs()
                runCatching {
                    destFile.createNewFile()
                }.onSuccess {
                    zip.getInputStream(entry)
                        .use { input -> destFile.outputStream().use { output -> input.copyTo(output) } }
                }.onFailure {
                    logger.warn("Encountered an error while extracting entry ${entry.name} from archive ${zf.name}.")
                }
                yield(destFile)
            }
        }
    }

    /**
     * Will compile all supported source files loaded in the given set.
     *
     * @param files [PlumeFile] pointers to source files.
     * @return A set of [PlumeFile] pointers to the compiled class files.
     */
    private fun compileLoadedFiles(files: HashSet<PlumeFile>): HashSet<JVMClassFile> {
        val splitFiles = mapOf<SupportedFile, MutableList<PlumeFile>>(
            SupportedFile.JAVA to mutableListOf(),
            SupportedFile.JVM_CLASS to mutableListOf()
        )
        // Organize file in the map. Perform this sequentially if there are less than 100,000 files.
        files.stream().let { if (files.size >= 100000) it.parallel() else it.sequential() }
            .toList().stream().forEach {
                when (it) {
                    is JavaFile -> splitFiles[SupportedFile.JAVA]?.add(it)
                    is JVMClassFile -> splitFiles[SupportedFile.JVM_CLASS]?.add(it)
                }
            }
        if (splitFiles.keys.contains(SupportedFile.JAVA) || splitFiles.keys.contains(SupportedFile.JVM_CLASS)) {
            addMetaDataInfo()
        }
        return splitFiles.keys.map {
            val filesToCompile = (splitFiles[it] ?: emptyList<JVMClassFile>()).toList()
            return@map when (it) {
                SupportedFile.JAVA -> compileJavaFiles(filesToCompile)
                SupportedFile.JVM_CLASS -> moveClassFiles(filesToCompile.map { f -> f as JVMClassFile }.toList())
            }
        }.asSequence().flatten().toHashSet()
    }

    private fun addMetaDataInfo() {
        val maybeMetaData = driver.getMetaData()
        if (maybeMetaData != null) {
            val metaData = maybeMetaData.build()
            if (metaData.language() != LANGUAGE_FRONTEND || metaData.version() != plumeVersion) {
                driver.deleteVertex(maybeMetaData.id(), META_DATA)
                driver.addVertex(NewMetaDataBuilder().language(LANGUAGE_FRONTEND).version(plumeVersion))
            }
        } else {
            driver.addVertex(NewMetaDataBuilder().language(LANGUAGE_FRONTEND).version(plumeVersion))
        }
    }

    /**
     * Projects all loaded classes to a base CPG.
     */
    fun project(): Extractor {
        configureSoot()
        val compiledFiles = compileLoadedFiles(loadedFiles)
        val classStream = loadClassesIntoSoot(compiledFiles)
        when (ExtractorOptions.callGraphAlg) {
            ExtractorOptions.CallGraphAlg.CHA -> CHATransformer.v().transform()
            ExtractorOptions.CallGraphAlg.SPARK -> SparkTransformer.v().transform("", ExtractorOptions.sparkOpts)
            else -> Unit
        }
        // Initialize program structure graph and scan for an existing CPG
        programStructure = driver.getProgramStructure()
        classStream.forEach(this::analyseExistingCPGs)
        // Update program structure after sub-graphs which will change are discarded
        programStructure.close()
        programStructure = driver.getProgramStructure()
        // Setup defaults
        setUpDefaultStructure()
        // Load all methods to construct the CPG from and convert them to UnitGraph objects
        val graphs = constructUnitGraphs(classStream)
        // Build external types from fields and locals
        createExternalTypes(
            classStream = classStream,
            typeStream = classStream.asSequence().map { it.fields }.flatten().map { it.type }
        )
        createExternalTypes(
            classStream = classStream,
            typeStream = graphs.asSequence().map { it.body.locals + it.body.parameterLocals }.flatten().map { it.type }
        )
        // Construct the CPGs for classes

        // Construct the CPGs for methods
        graphs.map(this::constructCPG)
            .toList().asSequence()
            .map(this::constructCallGraphEdges)
            .map { it.declaringClass }.distinct()
            .filter { classStream.contains(it) }
            .toList()
            .forEach(this::constructStructure)
        // Connect methods to their type declarations and source files (if present)
        graphs.forEach { SootToPlumeUtil.connectMethodToTypeDecls(it.body.method, driver) }
        clear()
        return this
    }

    /**
     * Adds additional data calculated from the graph using passes from [io.shiftleft.semanticcpg.passes] and
     * [io.shiftleft.dataflowengineoss.passes]. This is constructed from the base CPG and requires [Extractor.project]
     * to be called beforehand.
     */
    fun postProject(): Extractor {
        driver.getWholeGraph().use { g ->
            val cpg = Cpg.apply(g)
            // Run io.shiftleft.passes.CpgPass
            listOf(
                TypeDeclStubCreator(cpg),
                FileCreationPass(cpg),
                Linker(cpg),
//                NamespaceCreator(cpg), TODO: This conflicts with what Plume is doing in SootToPlumeUtil.kt
            ).map { it.run() }
                .map(CollectionConverters::IteratorHasAsJava)
                .flatMap { it.asJava().asSequence() }
                .forEach { DiffGraphUtil.processDiffGraph(driver, it) }
            // Run io.shiftleft.passes.ParallelCpgPass
            val reachingDefPass = ReachingDefPass(cpg)
            g.nodes(METHOD).asSequence().filterIsInstance<Method>()
                .map(reachingDefPass::runOnPart)
                .map(CollectionConverters::IteratorHasAsJava)
                .flatMap { it.asJava().asSequence() }
                .forEach { DiffGraphUtil.processDiffGraph(driver, it) }
            TrimPass(cpg).run().foreach { DiffGraphUtil.processDiffGraph(driver, it) }
        }
        return this
    }

    /**
     * Creates [TypeDecl] from external [soot.Type]s. This also links the [TypeDecl]s to their modifiers and the
     * unknown file vertex.
     *
     * @param classStream The stream of application [SootClass] to separate external classes from.
     * @param typeStream The stream of all [soot.Type]s.
     */
    private fun createExternalTypes(classStream: List<SootClass>, typeStream: Sequence<soot.Type>) {
        typeStream.distinct()
            .filter { t -> !classStream.any { it.name == t.toString() } }
            .filter {
                !driver.getWholeGraph().nodes(TYPE_DECL).asSequence().any { n -> n.property(FULL_NAME) == it.toString() }
            }
            .map { t -> buildType(t).apply { driver.addVertex(this); addSootToPlumeAssociation(t, this) }; t }
            .map { t -> buildTypeDeclaration(t).apply { driver.addVertex(this); addSootToPlumeAssociation(t, this)} }
            .forEach { t ->
                // Connect external type decls to the unknown file vert
                getSootAssociation(io.shiftleft.semanticcpg.language.types.structure.File.UNKNOWN())?.let {
//                    it.firstOrNull()?.let { f -> driver.addEdge(t, f, SOURCE_FILE) }
                }
                // Connect type decls to their modifiers
                obtainModifiersFromTypeDeclVert(t).forEachIndexed { i, m ->
                    driver.addEdge(t, NewModifierBuilder().modifierType(m).order(i + 1), AST)
                }
            }

    }

    /**
     * Load all methods to construct the CPG from and convert them to [BriefUnitGraph] objects.
     *
     * @param classStream A stream of [SootClass] to construct [BriefUnitGraph] from.
     * @return a list of [BriefUnitGraph] objects.
     */
    private fun constructUnitGraphs(classStream: List<SootClass>) = classStream.asSequence()
        .map { it.methods.filter { mtd -> mtd.isConcrete }.toList() }.flatten()
        .let {
            if (ExtractorOptions.callGraphAlg == ExtractorOptions.CallGraphAlg.NONE)
                it else it.map(this::addExternallyReferencedMethods).flatten()
        }
        .distinct().toList().let { if (it.size >= 100000) it.parallelStream() else it.stream() }
        .filter { !it.isPhantom }.map { m ->
            runCatching { BriefUnitGraph(m.retrieveActiveBody()) }
                .onFailure { logger.warn("Unable to get method body for method ${m.name}.") }
                .getOrNull()
        }.asSequence().filterNotNull().toList()

    /**
     * Sets up default vertices for placeholders like unknown files.
     */
    private fun setUpDefaultStructure() {
        val unknown = io.shiftleft.semanticcpg.language.types.structure.File.UNKNOWN()
        if (driver.getWholeGraph().nodes(FILE).asSequence<Node>().none { f: Node -> f.property(NAME) == unknown }) {
            val unknownFile = NewFileBuilder().name(unknown).order(0).hash(Option.apply(unknown))
            driver.addVertex(unknownFile)
            val fileNode = unknownFile.build()
            driver.getWholeGraph().node(unknownFile.id()).let { n ->
                fileNode.properties().foreach { e -> n.setProperty(e._1, e._2) }
            }
            addSootToPlumeAssociation(unknown, unknownFile)
        }
    }

    /**
     * Searches for methods called outside of the application perspective. If they belong to classes loaded in Soot then
     * they are added to a list which is then returned including the given method.
     *
     * @param mtd The [SootMethod] from which the calls to methods will be collected.
     * @return The list of methods called including the given method.
     */
    private fun addExternallyReferencedMethods(mtd: SootMethod): List<SootMethod> {
        val cg = Scene.v().callGraph
        val edges = cg.edgesOutOf(mtd) as Iterator<Edge>
        return edges.asSequence()
            .map { e ->
                runCatching { e.tgt.method() }
                    .onFailure { logger.warn("Unable to get method for externally referenced method ${e.tgt}.") }
                    .getOrNull()
            }
            .filterNotNull().toMutableList().apply { this.add(mtd) }
    }

    /**
     * Constructs type, package, and source file information from the given class.
     *
     * @param cls The [SootClass] containing the information to build program structure information from.
     */
    private fun constructStructure(cls: SootClass) {
        if (driver.getWholeGraph().nodes(FILE).asSequence()
                .none { it.property(NAME) == SootToPlumeUtil.sootClassToFileName(cls) }
        ) {
            logger.debug("Building file, namespace, and type declaration for ${cls.name}")
            val file = buildClassStructure(cls, driver)
            var typeDecl = buildTypeDeclaration(cls.type, false)
            var type = buildType(cls.type, false)

            var existTypeDecl = getSootAssociation(cls.type)?.first {
                it is NewTypeDeclBuilder
            }
            if (existTypeDecl != null) {
                typeDecl = existTypeDecl as NewTypeDeclBuilder
            }

            var existType = getSootAssociation(cls.type)?.first {
                it is NewTypeBuilder
            }
            if (existType != null) {
                type = existType as NewTypeBuilder
            }
            driver.addVertex(type)
            driver.addVertex(typeDecl)

            determineModifiers(cls.modifiers)
                .mapIndexed { i, m -> NewModifierBuilder().modifierType(m).order(i + 1) }
                .forEach { driver.addEdge(typeDecl, it, AST) }
            cls.fields.forEachIndexed { i, field ->
                SootToPlumeUtil.projectMember(field, i + 1).let { memberVertex ->
                    driver.addEdge(typeDecl, memberVertex, AST)
                    addSootToPlumeAssociation(field, memberVertex)
                }
            }
//            driver.addEdge(typeDecl, file, SOURCE_FILE)
            addSootToPlumeAssociation(cls, typeDecl)
            addSootToPlumeAssociation(cls, type)
        }
    }

    /**
     * Constructs the code-property graph from a method's [BriefUnitGraph].
     *
     * @param graph The [BriefUnitGraph] to construct the method head and body CPG from.
     * @return The given graph.
     */
    private fun constructCPG(graph: BriefUnitGraph): BriefUnitGraph {
        // If file does not exists then rebuild, else update
        val cls = graph.body.method.declaringClass
        val files = driver.getWholeGraph().nodes { it == ODBFile.Label() }.asSequence()
        //Build Class Head

        if (files.none { it.property(NAME) == SootToPlumeUtil.sootClassToFileName(cls) }) {
            logger.debug("Projecting ${graph.body.method}")
            // Build head
            SootToPlumeUtil.buildMethodHead(graph.body.method, driver)

            // Build body
            astBuilder.buildMethodBody(graph)
            cfgBuilder.buildMethodBody(graph)
            pdgBuilder.buildMethodBody(graph)
        } else {
            logger.debug("${graph.body.method} source file found in CPG, no need to build")
        }
        return graph
    }

    private fun analyseExistingCPGs(cls: SootClass) {
        val currentFileHash = getFileHashPair(cls)
        val files = driver.getWholeGraph().nodes { it == FILE }.asSequence()
        logger.debug("Looking for existing file vertex for ${cls.name} from given file hash $currentFileHash")
        files.firstOrNull { it.property(NAME) == SootToPlumeUtil.sootClassToFileName(cls) }?.let { fileV ->
            if (fileV.property(HASH) != currentFileHash) {
                logger.debug("Existing class was found and file hashes do not match, marking for rebuild.")
                // Rebuild
                driver.getNeighbours(mapToVertex(fileV)).use { neighbours ->
                    neighbours.nodes { it == Method.Label() }.forEach { mtdV: Node ->
                        val mtd1 = (mapToVertex(mtdV) as NewMethodBuilder).build()
                        logger.debug(
                            "Deleting method and saving incoming call graph edges for " +
                                    "${mtd1.fullName()} ${mtd1.signature()}"
                        )
                        driver.getMethod(mtd1.fullName(), false).use { g ->
                            g.nodes { it == Method.Label() }.asSequence().firstOrNull()?.let { mtdV: Node ->
                                val mtd2 = mapToVertex(mtdV) as NewMethodBuilder
                                val builtMtd2 = mtd2.build()
                                driver.getNeighbours(mtd2).use { ns ->
                                    if (ns.V(mtdV.id()).hasNext()) {
                                        ns.V(mtdV.id()).next().`in`(CALL).asSequence()
                                            .filterIsInstance<Call>()
                                            .forEach {
                                                saveCallGraphEdge(
                                                    builtMtd2.fullName(),
                                                    mapToVertex(it) as NewCallBuilder
                                                )
                                            }
                                    }
                                }
                            }
                        }
                        driver.deleteMethod(mtd1.fullName())
                    }
                }
                logger.debug("Deleting $fileV")
                driver.deleteVertex(fileV.id(), fileV.label())
                // Delete TypeDecls
                driver.getWholeGraph().nodes { it == TYPE_DECL }.asSequence()
                    .filter { it.property(FULL_NAME) == cls.type.toQuotedString() }
                    .forEach { typeDecl ->
                        logger.debug("Deleting $typeDecl")
                        driver.getNeighbours(NewTypeDeclBuilder().id(typeDecl.id())).use { g ->
                            g.nodes(typeDecl.id()).next().out(AST)
                                .forEach { logger.debug("Deleting $it"); driver.deleteVertex(it.id(), it.label()) }
                        }
                        driver.deleteVertex(typeDecl.id(), TYPE_DECL)
                    }
            } else {
                logger.debug("Existing class was found and file hashes match, no need to rebuild.")
            }
        }
    }

    /**
     * Once the method bodies are constructed, this function then connects calls to the called methods if present.
     *
     * @param graph The [BriefUnitGraph] from which calls are checked and connected to their referred methods.
     * @return The method from the given graph.
     */
    private fun constructCallGraphEdges(graph: BriefUnitGraph): SootMethod {
        if (ExtractorOptions.callGraphAlg != ExtractorOptions.CallGraphAlg.NONE) callGraphBuilder.buildMethodBody(graph)
        return graph.body.method
    }

    /**
     * Configure Soot options for CPG transformation.
     */
    private fun configureSoot() {
        // set application mode
        Options.v().set_app(true)
        // make sure classpath is configured correctly
        Options.v().set_soot_classpath(COMP_DIR)
        Options.v().set_prepend_classpath(true)
        // keep debugging info
        Options.v().set_keep_line_number(true)
        Options.v().set_keep_offset(true)
        // ignore library code
        Options.v().set_no_bodies_for_excluded(true)
        Options.v().set_allow_phantom_refs(true)
        // keep variable names
        PhaseOptions.v().setPhaseOption("jb", "use-original-names:true")
        // call graph options
        if (ExtractorOptions.callGraphAlg != ExtractorOptions.CallGraphAlg.NONE)
            Options.v().set_whole_program(true)
        if (ExtractorOptions.callGraphAlg == ExtractorOptions.CallGraphAlg.SPARK) {
            Options.v().setPhaseOption("cg", "enabled:true")
            Options.v().setPhaseOption("cg.spark", "enabled:true")
        }
    }

    /**
     * Obtains the class path the way Soot expects the input.
     *
     * @param classFile The class file pointer.
     * @return The qualified class path with periods separating packages instead of slashes and no ".class" extension.
     */
    private fun getQualifiedClassPath(classFile: File): String = classFile.absolutePath
        .removePrefix(COMP_DIR + File.separator)
        .replace(File.separator, ".")
        .removeSuffix(".class")

    /**
     * Given a list of class names, load them into the Scene.
     *
     * @param classNames A set of class files.
     * @return the given class files as a list of [SootClass].
     */
    private fun loadClassesIntoSoot(classNames: HashSet<JVMClassFile>): List<SootClass> {
        classNames.map(this::getQualifiedClassPath).forEach(Scene.v()::addBasicClass)
        Scene.v().loadBasicClasses()
        Scene.v().loadDynamicClasses()
        return classNames.map { Pair(it, getQualifiedClassPath(it)) }
            .map { Pair(it.first, Scene.v().loadClassAndSupport(it.second)) }
            .map { clsPair: Pair<File, SootClass> ->
                val f = clsPair.first
                val cls = clsPair.second
                cls.setApplicationClass(); putNewFileHashPair(cls, f.hashCode().toString())
                cls
            }
    }

    /**
     * Clears resources of file and graph pointers.
     */
    private fun clear() {
        loadedFiles.clear()
        classToFileHash.clear()
        sootToPlume.clear()
        savedCallGraphEdges.clear()
        programStructure.close()
        File(COMP_DIR).deleteRecursively()
        G.reset()
        G.v().resetSpark()
    }

}

SootToPlumeUtil.kt

    fun buildType(type: soot.Type, isExternal: Boolean = true): NewTypeBuilder {
        val filename = if (isExternal) {
            io.shiftleft.semanticcpg.language.types.structure.File.UNKNOWN()
        } else {
            if (type.toQuotedString().contains('.')) "/${
                type.toQuotedString().replace(".", "/").removeSuffix("[]")
            }.class"
            else type.toQuotedString()
        }
        val parentType = if (type.toQuotedString().contains('.')) type.toQuotedString().substringBeforeLast(".")
        else type.toQuotedString()
        val shortName = if (type.toQuotedString().contains('.')) type.toQuotedString().substringAfterLast('.')
        else type.toQuotedString()

        return NewTypeBuilder()
            .name(shortName)
            .fullName(type.toQuotedString())
            .typeDeclFullName(type.toQuotedString())
            .apply { addSootToPlumeAssociation(type, this) }
    }

ossdataflow only made edges for METHOD =ReachingDef=>MethodParameterIn

Logs:

joern> cpg.graph.edges.filter(_.isInstanceOf[edges.ReachingDef]).foreach{it => println(it.inNode.propertyMap)} 
{ORDER=1, CODE=java.lang.String fieldParam, COLUMN_NUMBER=-1, LINE_NUMBER=37, TYPE_FULL_NAME=java.lang.String, EVALUATION_STRATEGY=BY_REFERENCE, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=fieldParam}
{ORDER=1, CODE=java.lang.String fieldParam, COLUMN_NUMBER=-1, LINE_NUMBER=3, TYPE_FULL_NAME=java.lang.String, EVALUATION_STRATEGY=BY_REFERENCE, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=fieldParam}
{ORDER=1, CODE=java.lang.String fieldParam, COLUMN_NUMBER=-1, LINE_NUMBER=6, TYPE_FULL_NAME=java.lang.String, EVALUATION_STRATEGY=BY_REFERENCE, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=fieldParam}
{ORDER=1, CODE=java.lang.Object canHandle, COLUMN_NUMBER=-1, LINE_NUMBER=10, TYPE_FULL_NAME=java.lang.Object, EVALUATION_STRATEGY=BY_REFERENCE, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=canHandle}
{ORDER=1, CODE=java.lang.String[] argv, COLUMN_NUMBER=-1, LINE_NUMBER=31, TYPE_FULL_NAME=java.lang.String[], EVALUATION_STRATEGY=BY_REFERENCE, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=argv}

joern> cpg.graph.edges.filter(_.isInstanceOf[edges.ReachingDef]).foreach{it => println(it.outNode.propertyMap)} 
{COLUMN_NUMBER=-1, LINE_NUMBER=5, IS_EXTERNAL=false, SIGNATURE=void(java.lang.String), NAME=<init>, AST_PARENT_TYPE=TYPE_DECL, AST_PARENT_FULL_NAME=org.my.CalleeClass, ORDER=1, CODE=void <init>(java.lang.String param1), FULL_NAME=org.my.CalleeClass.<init>:void(java.lang.String), FILENAME=/org/my/CalleeClass.class}
{COLUMN_NUMBER=-1, LINE_NUMBER=5, IS_EXTERNAL=false, SIGNATURE=org.my.CalleeClass(java.lang.String), NAME=tt, AST_PARENT_TYPE=TYPE_DECL, AST_PARENT_FULL_NAME=org.my.Test, ORDER=1, CODE=org.my.CalleeClass tt(java.lang.String param1), FULL_NAME=org.my.Test.tt:org.my.CalleeClass(java.lang.String), FILENAME=/org/my/Test.class}
{COLUMN_NUMBER=-1, LINE_NUMBER=9, IS_EXTERNAL=false, SIGNATURE=java.lang.Object(java.lang.String), NAME=tt2, AST_PARENT_TYPE=TYPE_DECL, AST_PARENT_FULL_NAME=org.my.Test, ORDER=1, CODE=java.lang.Object tt2(java.lang.String param1), FULL_NAME=org.my.Test.tt2:java.lang.Object(java.lang.String), FILENAME=/org/my/Test.class}
{COLUMN_NUMBER=-1, LINE_NUMBER=13, IS_EXTERNAL=false, SIGNATURE=java.lang.Object(java.lang.Object), NAME=tt3, AST_PARENT_TYPE=TYPE_DECL, AST_PARENT_FULL_NAME=org.my.Test, ORDER=1, CODE=java.lang.Object tt3(java.lang.Object param1), FULL_NAME=org.my.Test.tt3:java.lang.Object(java.lang.Object), FILENAME=/org/my/Test.class}
{COLUMN_NUMBER=-1, LINE_NUMBER=33, IS_EXTERNAL=false, SIGNATURE=void(java.lang.String[]), NAME=main, AST_PARENT_TYPE=TYPE_DECL, AST_PARENT_FULL_NAME=org.my.Test, ORDER=1, CODE=void main(java.lang.String[] param1), FULL_NAME=org.my.Test.main:void(java.lang.String[]), FILENAME=/org/my/Test.class}

Code

----Test.java
package org.my;

public class Test {

    CalleeClass tt(String fieldParam){
        return new CalleeClass(fieldParam);
    }

    Object tt2(String fieldParam){
        return new CalleeClass(fieldParam);
    }

    Object tt3(Object canHandle){
        return canHandle;
    }

    public void caller(){
        Test t = new Test();
        CalleeClass gg = t.tt("tt1");
        Object gt = t.tt2("tt2");
        Object gt2 = tt3(gt);
        ///
        Object gt3 = tt3(gg);
    }

    public static void main(String[] argv){
        Test tt = new Test();
        tt.caller();
    }

----CalleeClass.java
package org.my;

public class CalleeClass {
    private final String field1;

    public CalleeClass(String fieldParam) {
        this.field1 = fieldParam;
    }
}

}
DavidBakerEffendi commented 3 years ago

@popthink Would you like to make the changes in a PR or should I use what is in this thread to address this issue? Some of the fixes will overlap with #78 but I think this issue should focus on getting everything needed for REACHING_DEF while #78 should make sure that TYPE_DECLs and graph updates work correctly.

popthink commented 3 years ago

OK. I will try. The codes just focused to remove exceptions of REACH-DEF. So I couldn't sure the quality of fix for PR.😅 I just wanted sharing the snippet for testing. My poor fix includes making Type nodes and #78, removed SOURCE-FILE relationship which will be created by joern.

popthink commented 3 years ago

I think that I found the reason why "REACH-DEF" of joern is not working except METHOD==>METHOD_PARAMETER_IN

ReachingDefPass needs CFGEdges from a Method to every CFGNodes within the Method.

cfgNode used by ReachingDefPass of joern doesn't include anyNode of CFG.

ReachingDefPass requires "Contains" Edge created by some pass.

      MethodMethods.scala
        def cfgNode: Traversal[nodes.CfgNode] =
             method._containsOut.asScala.collect { case cfgNode: nodes.CfgNode => cfgNode }
DavidBakerEffendi commented 3 years ago

Okay thanks, I'll have a look at adding io.shiftleft.semanticcpg.passes.containsedges.ContainsEdgePass.scala today.

popthink commented 3 years ago

The pass creating "contains" edges of method by using AST nodes is "ContainsEdgePass(cpg)".

For Testing with joern.

 ## joern --language java
importCpg("cpg.bin")
import io.shiftleft.semanticcpg.passes.containsedges.ContainsEdgePass 
import io.shiftleft.passes.DiffGraph
import io.shiftleft.dataflowengineoss.passes.reachingdef._

var containsEdge = new ContainsEdgePass(cpg)
containsEdge.createAndApply() 
var reach =  new ReachingDefPass(cpg); 

val dstGraph = DiffGraph.newBuilder 
cpg.method.l.foreach{it => 
var diff = reach.runOnPart(it).next;
var filtered = diff.edgesInOriginal.filterNot{it=>(it.src.label  == "TYPE_REF" || it.dst.label == "TYPE_REF")};
filtered.foreach{it => dstGraph.addEdgeInOriginal(it.src, it.dst, "REACHING_DEF", it.properties)};
val appliedDiff = DiffGraph.Applier.applyDiff(dstGraph.build(), cpg, false, None);
}
cpg.graph.edges.filter(_.isInstanceOf[edges.ReachingDef]).foreach{it => println(it.bothNodes.l); println(it.propertyMap)}

It makes some Reaching_DEF edge but it still occurs an exception if I don't filter ReachingDEF edges from or to "TYPE_REF".

Because TYPE_REF node can't have edges of Reaching-DEF according to the schema. But Reaching_DEF edges for TYPE_REF are derived by "ReachingDefPass". (TYPE_REF node is 'CfgNode' because it is 'Expression', so containsEdge point to them andthey are included while ReachingDefPass )

And I figured out that the results of ReachingDefPass are different, between result of Plume driver and of joern. (Added ConatinsEdgePass in postProject method)

reachableBy still doesn't work as expected but reachable relation among MethodReturn=>Return=>MethodParameterIn works under Intraprodural with following code.

   Object tt3(Object canHandle){
        return canHandle;
    }

  // def sink = cpg.method.name("tt3").methodReturn
 // def source = cpg.parameter
DavidBakerEffendi commented 3 years ago

Okay I see - I've gone ahead and asked Fabian for ocular generated CPGs from the code in my unit tests and I will work on correcting the CPG's generated by Plume

popthink commented 3 years ago

After @DavidBakerEffendi changed NewTypeRefBuild to NewUnknown- for newexpr. In my case, ReachingDefPass in ossdataflow doesn't make exceptions.

DavidBakerEffendi commented 3 years ago

I also now catch the exceptions and log them now as warnings. So just check the logs first. But once #93 is addressed it should work too