testable-eu / sast-tp-framework

TP-Framework: Testability Pattern Framework for SAST
https://owasp.org/www-project-testability-patterns-for-web-applications/
Apache License 2.0
11 stars 3 forks source link

[Discovery] PHP CPG generation crashes due to OOM errors #56

Closed mlessio closed 1 year ago

mlessio commented 1 year ago

While trying to execute discovery phase on a real-world PHP Application, the CPG generation ends up with an error message, which seems related to memory consumption on the JVM.

Here follows the discovery execution output:

14:52 - ERROR - Error while generating CPG for source.
Traceback (most recent call last):
  File "/tp-framework/tp_framework/core/discovery.py", line 63, in generate_cpg
    cpg_gen_output = run_generate_cpg_cmd(gen_cpg_with_params_cmd, working_dir)
  File "/tp-framework/tp_framework/core/discovery.py", line 102, in run_generate_cpg_cmd
    raise e
  File "/tp-framework/tp_framework/core/discovery.py", line 98, in run_generate_cpg_cmd
    output = subprocess.check_output(gen_cpg_with_params_cmd, shell=True).decode('utf-8-sig')
  File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command './php2cpg /tp-framework/in/PHP_TesCase -o /tp-framework/out/discovery_2023-05-03-14-49-03_PHP_PHP_TesCase/cpg_2023-05-03-14-49-03_PHP_PHP_TesCase.bin -c main.conf bytecode 7' returned non-zero exit status 137.
Traceback (most recent call last):
  File "/tp-framework/tp_framework/core/discovery.py", line 63, in generate_cpg
    cpg_gen_output = run_generate_cpg_cmd(gen_cpg_with_params_cmd, working_dir)
  File "/tp-framework/tp_framework/core/discovery.py", line 102, in run_generate_cpg_cmd
    raise e
  File "/tp-framework/tp_framework/core/discovery.py", line 98, in run_generate_cpg_cmd
    output = subprocess.check_output(gen_cpg_with_params_cmd, shell=True).decode('utf-8-sig')
  File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command './php2cpg /tp-framework/in/PHP_TesCase -o /tp-framework/out/discovery_2023-05-03-14-49-03_PHP_PHP_TesCase/cpg_2023-05-03-14-49-03_PHP_PHP_TesCase.bin -c main.conf bytecode 7' returned non-zero exit status 137.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tpframework", line 33, in <module>
    sys.exit(load_entry_point('tp-framework', 'console_scripts', 'tpframework')())
  File "/tp-framework/tp_framework/cli/main.py", line 46, in main
    discovery_pattern_cmd.execute_command(args)
  File "/tp-framework/tp_framework/cli/tpf_commands.py", line 263, in execute_command
    interface.run_discovery_for_pattern_list(target_dir, l_pattern_id, language, tool_parsed, tp_lib_path,
  File "/tp-framework/tp_framework/cli/interface.py", line 74, in run_discovery_for_pattern_list
    d_res = discovery.discovery(Path(src_dir), pattern_id_list, tp_lib_path, itools, language, build_name,
  File "/tp-framework/tp_framework/core/discovery.py", line 260, in discovery
    cpg: Path = generate_cpg(src_dir, language, build_name, disc_output_dir, timeout_sec=timeout_sec)
  File "/tp-framework/tp_framework/core/discovery.py", line 68, in generate_cpg
    raise rs
core.exceptions.CPGGenerationError: Error while generating CPG for source.
compaluca commented 1 year ago

@simkoc , @mal-tee : can you help Martino on this? maybe the memory can be easily extended for the java process?

mal-tee commented 1 year ago

You can adjust the JVM Memory settings in the "php2cpg" bash script in /php-cpg/php2cpg:

root@ce400482bd53:/php-cpg# cat php2cpg
#!/bin/bash

SCRIPT_ABS_PATH=$(readlink -f "$0")
SCRIPT_ABS_DIR=$(dirname $SCRIPT_ABS_PATH)
JAVA_OPTS='-Xmx20g -Xss30m -XX:+UnlockDiagnosticVMOptions -XX:+ExitOnOutOfMemoryError -XX:AbortVMOnException=java.lang.StackOverflowError' $SCRIPT_ABS_DIR/target/universal/stage/bin/multilayer-php-cpg-generator -- $@
mlessio commented 1 year ago

What about a recommended setting? It is already assigning 20G of ram and a 30M stack size, which is quite a big stack. Is there anything else i can do for optimization?

compaluca commented 1 year ago

I was wondering whether it was necessary to raise memory for Joern as well, but from what I can see from the error this is only originated by php2cpg.

@mal-tee : any oither idea? did you experience similar situations on real applications @mlessio : can you share some meta info on the PHP app? total LoC, num of PHP files, ...

mlessio commented 1 year ago

@compaluca: sure, here follows the output of the CLOC tool, reporting the LoC number and some other useful data.

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
PHP                            309           7345          13294          82814
JavaScript                     400           9578           7768          48182
CSS                             41            579            403          12401
LESS                            70           1160           1521           5472
JSON                             9              0              0           4581
HTML                            22            350            121           2513
Markdown                        15            673              0           2473
SVG                              1              0              0            288
Maven                            1              1              0            199
Python                           1             40              4            140
YAML                             2             17             20             77
XML                              5              0              0             68
Bourne Shell                     1              0              1             14
-------------------------------------------------------------------------------
SUM:                           877          19743          23132         159222
-------------------------------------------------------------------------------

As you can see, it is a mid-sized PHP application. The CPG generation was executed on a laptop equipped with 16GB of RAM.

@mal-tee: We have also tried on a real application such as Wordpress (https://github.com/WordPress/WordPress) and we obtained the same OOM error. So, you can consider it as a candidate application to reproduce the issue.

mal-tee commented 1 year ago

I was wondering whether it was necessary to raise memory for Joern as well, but from what I can see from the error this is only originated by php2cpg.

@mal-tee : any oither idea? did you experience similar situations on real applications @mlessio : can you share some meta info on the PHP app? total LoC, num of PHP files, ...

Yes, we encounter that on a regular basis. Static analysis is resource hungry. We encounter OOM for large apps even with more RAM (highest I tried for a single app was 128 GB).

One possible workaround is to disable the "Dominator" and "PostDominator" in the /php-cpg/main.conf by removing the respective lines if the dominator edges aren't used in the testcases. These passes are especially resource intensive.

mlessio commented 1 year ago

Hi @mal-tee, we have tried disabling both the Dominator and PostDominator passes, but unfortunately we get the same OOM error. Any other idea?

mal-tee commented 1 year ago

Unfortunately not, no. Seems like its not possible to convert these apps with the current version.

I don't quite remember if we have dicovery queries that span multiple files, but if we don't we could generate a CPG on a file-level and not for the whole project. But this will be prone to issues of course.

mlessio commented 1 year ago

@mal-tee @compaluca update:

it seems that lowering the RAM assigned to the php2cpg JVM (e.g. -Xmx8g) totally fixes this issue, even re-enabling the Dominator and PostDominator passes.

I'll keep you updated!