Open nekon02 opened 1 year ago
There are a lot of macros/defines in that code. Are they available? (included? in your systems c/c++ compiler path). Otherwise, parsing that might fail.
Thank you for replying quickly, but sadly no, the dataset that I used (Bigvul) only includes part of the function source code and does not include any additional header files or defines part. Do you have any recommendations on how I should do in this situation?
I also have another question, when using joern-parse and joern-export is there a way to only export the main method of the source code so I can save it to a correct data row in the dataset?
It looks like these macros/defines are from openssl (e.g., see: https://docs.huihoo.com/doxygen/openssl/1.0.1c/crypto_2ossl__typ_8h_source.html).
The code static time_t asn1_time_to_time_t(ASN1_UTCTIME * timestr TSRMLS_DC) { ... }
is not even valid C/C++ without the define for TSRMLS_DC
which is something like:
#define TSRMLS_D void ***tsrm_ls
#define TSRMLS_DC , TSRMLS_D
Maybe using c2cpg with --with-include-auto-discovery
or --include <path-to-openssl>
with openssl in your system helps.
Thank you for the suggestion, after try putting the #define TSRMLS_DC, TSRMLS_D in the source code it works and returns the graph representation now. But when I try to use
joern-parse <source.c> --frontend-args --with-include-auto-discovery
joern-parse <source.c> --frontend-args --include <path-to-openssl>
with OpenSSL in my wsl ubuntu noting changes from the original. In addition, is there a way to use --with-include-auto-discovery in the joern interactive shell or it only possible in joern-parse
What's the output of gcc -xc -E -v /dev/null -o /dev/null
on your system?
Is the path to the openssl header files in that? --with-include-auto-discovery
will only look at these folders.
Where are the openssl header files installed and did you provide the correct path to --include <path-to-openssl>
if you used that argument?
Frontend args may be supplied like this:
joern> importCode.c("/path/to/your/code", args=List("--something"))
I believe I install it correctly. This is the output from gcc -xc -E -v /dev/null -o /dev/null
Using built-in specs.
COLLECT_GCC=gcc
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04.1' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)
COLLECT_GCC_OPTIONS='-E' '-v' '-o' '/dev/null' '-mtune=generic' '-march=x86-64'
/usr/lib/gcc/x86_64-linux-gnu/11/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu /dev/null -o /dev/null -mtune=generic -march=x86-64 -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -fcf-protection -dumpbase null
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/11/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/11/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/11/:/usr/lib/gcc/x86_64-linux-gnu/11/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/11/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/11/:/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/11/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-E' '-v' '-o' '/dev/null' '-mtune=generic' '-march=x86-64'
The OpenSSL header files are in the /usr/include/openssl which I try to use this path for --include /usr/include/openssl
but I still get the same result with
joern-parse 177740.c --frontend-args --include /usr/include/openssl/
I think this might be because the text source code in the dataset is from the older version(2018~)
To check that you could have a look into /usr/include/openssl/
.
Grep for the defines that are missing and see if they are there.
If so, it should be sufficient to run c2cpg with --with-include-auto-discovery
as /usr/include
is available.
Seem like it would be the case that the c header is no longer in the program from openssl as I try to use grep to search for TSRMLS_DC and cannot find any. I think the best solution now is to just add the #define TSRMLS_DC , TSRMLS_D in the source code directly. Thank you very much.
I also have a question regarding the joern-export, as I want to extract the graph representation and use it for machine learning in Python, is there a way to only export the main method(e.g. only asn1_time_to_time_t AST)?
Maybe https://docs.joern.io/export/ helps?
Thank you for the link, when I try to follow the method in the link I still can't find a way that I expect. For example with this example code test.c
int myfunc(int b)
{
int a = 42;
if (b > 10) {
foo(a)
}
bar(a);
}
and use joern-parse test.c
joern-export --repr pdg --out testpdg
I will get 6 pdg files.
is there a way to only get the pdg for "myfunc" or show which diagram method is in the file name?
Hi, I am trying to use Joern for my project of vulnerability detection from source code graph representation with machine learning. For some of the source code in the dataset, joern cannot find the primary method and only have global in a name list with only 18 nodes. Is there any way to fix this? This is the C code I try to extract the graph from:
This is the output from Joern:
I using Joern version 2.0.19 in WSL ubantu. Thank you in advance.