joernio / joern

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc
https://joern.io/
Apache License 2.0
2.02k stars 270 forks source link

[Bug]Fail to get CPG for Ruby #4732

Closed spingARbor closed 2 months ago

spingARbor commented 3 months ago

Describe the bug I want to construct CPG for part of code for giltab. When I try to parse the code, I get the following exception. joern-bug When I try to dump Graph, I get the following exception and no output was produced. joern-bug-1

To Reproduce Steps to reproduce the behavior:

  1. Prepare a 256G memory Ubuntu server
  2. Install the latest release of Joern
  3. Download the source code of Gitlab at https://gitlab.com/gitlab-org/gitlab/-/archive/v16.10.4-ee/gitlab-v16.10.4-ee.tar.gz
  4. Extract the file to /home/ubuntu/gitlab-v16.10.4-ee
  5. Parse the code of app folder
    joern-parse /home/ubuntu/joern/gitlab-v16.10.4-ee/app

    and you can get the exception

    overflowdb.SchemaViolationException: IN edge with label AST to an adjacent TYPE_DECL is mandatory, but not defined for this MEMBER node with id=43243
  6. Dump Graph
    joern-export --repr cpg --format=dot

    and you can get the exception

    ERROR CpgPassBase: Pass io.joern.dataflowengineoss.passes.reachingdef.ReachingDefPass failed
    overflowdb.SchemaViolationException: OUT edge with label REF to an adjacent METHOD is mandatory, but not defined for this METHOD_REF node with id=14929

Expected behavior Get a CPG for Gitlab code.

Desktop (please complete the following information):

DavidBakerEffendi commented 3 months ago

@spingARbor Have you set Xmx properties somewhere so that Joern is making use of the available memory?

DavidBakerEffendi commented 3 months ago

One way to do it is to add -J-Xmx20G or something to the Joern arguments. This schema violation thing can happen at low memory

spingARbor commented 3 months ago

Thank you for your response!@DavidBakerEffendi I have set -Xmx240G in my server.And when I try to create CPG for a folder with a few ruby files, I also got the exception and without cpg dot file output.On the other hand,The running information will tell me that there is not enough memory if that happens as a result of low memory. Sincerely, Arbor

DavidBakerEffendi commented 3 months ago

@spingARbor I would hope that's enough memory! If you see that these exceptions when memory usage on the heap gets high then we know the culprit, otherwise I should have a look into where this could possibly be caused as the stacktrace suggests the AST may be malformed.

spingARbor commented 3 months ago

I will try again with bigger memory

spingARbor commented 3 months ago

@DavidBakerEffendi I tryed to parse the code for gitlab-v16.10.4-ee/app/workers and monitor memory usage at the same time.Joern’s memory usage was always lower than my settings, but I still get the same error.

DavidBakerEffendi commented 3 months ago

@spingARbor OK thanks for the investigation, I'll investigate on my end

spingARbor commented 3 months ago

Thank you @DavidBakerEffendi Sincerely, Arbor

DavidBakerEffendi commented 3 months ago

Found a number of bugs with this that'll help us harden Ruby, see:

spingARbor commented 3 months ago

Thank you. @DavidBakerEffendi

DavidBakerEffendi commented 2 months ago

Some more hardening: https://github.com/joernio/joern/pull/4746 Next to deal with the member issue