Open UESuperGate opened 2 weeks ago
Here I found the meaning of <unknown>
file node. But still confused about the problems in R1 and R2.
I think I found where it goes weird.
In joern-cli/frontends/c2cpg/src/main/scala/io/joern/c2cpg/astcreation/AstCreator.scala
(line 51), method createAst
creates file node with the name fileName(cdtAst)
:
def createAst(): DiffGraphBuilder = {
val fileContent = if (!config.disableFileContent) Option(cdtAst.getRawSignature) else None
val fileNode = NewFile().name(fileName(cdtAst)).order(0) // create file node with `fileName(cdtAst)`
fileContent.foreach(fileNode.content(_))
val ast = Ast(fileNode).withChild(astForTranslationUnit(cdtAst))
Ast.storeInDiffGraph(ast, diffGraph)
diffGraph
}
fileName
method is defined in joern-cli/frontends/c2cpg/src/main/scala/io/joern/c2cpg/astcreation/AstCreatorHelper.scala
(line 89-92). I guess this method returns the file name the given IASTNode belongs to via the nullSafeFileLocation
method.
protected def fileName(node: IASTNode): String = {
val path = nullSafeFileLocation(node).map(_.getFileName).getOrElse(filename)
SourceFiles.toRelativePath(path, config.inputPath)
}
For the example I showed when explaining R2, path
finally equals fetch.h
's path instead of fetch.c
's. Here I believe the problem comes from nullSafeFileLocation
. However, I am not quite familiar with Eclipse's AST parser, so I cannot tell what's going wrong.
As an alternative solution, I think using getContainingFilename
of IASTNode directly should be good. This method can locate where the AST node is according to the [docs](https://help.eclipse.org/latest/topic/org.eclipse.cdt.doc.isv/reference/api/org/eclipse/cdt/core/dom/ast/IASTNode.html#getContainingFilename())
protected def fileName(node: IASTNode): String = {
/// val path = nullSafeFileLocation(node).map(_.getFileName).getOrElse(filename)
val path = node.getContainingFilename()
SourceFiles.toRelativePath(path, config.inputPath)
}
This at least works for the scenario discussed in R2. But I'm not sure whether it would cause other errors in Joern.
Describe the bug Given a directory with two files
fetch.h
andfetch.c
as below:fetch.h
fetch.c
Using Joern command line with
ImportCode
will cause two weird results:R1: If we parse the whole directory (i.e., both
fetch.h
andfetch.c
at the same time), the following file nodes will be generated:While I fully understand
fetch.c
file node, I'm pretty confused about the other 4 file nodes. Why there are 2 file nodes forfetch.h
instead of 1? Isfetch.h
node a subset of<includes>
node? What does the<unknown>
file node mean?R2: If we only parse
fetch.c
, the file nodes are shown as below:It seems like Joern can automatically include
fetch.h
into the analysis even if I did not manually specify. However, I noticed that the file imports at line 1-2 infetch.c
are missing using the commandcpg.file.nameExact("fetch.c").ast.isImport.l
(which is empty), but can be found usingcpg.file.nameExact("./fetch.h").ast.isImport.l
, which are shown below:To Reproduce Steps to reproduce the behavior of R1:
joern
in shell.importCode("<path_to_directory>", "test")
in joern's command line, where<path_to_directory>
specifies the path to the directory only containsfetch.h
andfetch.c
.cpg.file.l
in joern's commandline.Steps to reproduce the behavior of R2:
joern
in shell.importCode("<path_to_fetch_c>", "test")
in joern's command line, where<path_to_fetch_c>
specifies the path tofetch.c
.cpg.file.nameExact("fetch.c").ast.isImport
in joern's command line to see the file import nodes offetch.c
.cpg.file.l
in joern's commandline to get thename
offetch.h's
file node.cpg.file.nameExact("<path_to_fetch_h>").ast.isImport
in joern's command line, where<path_to_fetch_h>
is the file node name we have got from step 4 to see the import nodes offetch.h
.Expected behavior For R1, I expected only three file nodes are generated (
fetch.h
,fetch.c
and<includes>
).For R2, I expected the step 3's output contains
fetch.h
andcache.h
, and step 5's output has nothing becausefetch.h
does not have any file imported.Desktop (please complete the following information):