mandiant / capa

The FLARE team's open-source tool to identify capabilities in executable files.
Apache License 2.0
3.99k stars 499 forks source link

FEAT(capa2sarif) Add SARIF conversion script from json output #2093

Closed ReversingWithMe closed 4 weeks ago

ReversingWithMe commented 1 month ago

SARIF gets you navigation for binary beacons from capa in any tool that supports SARIF(e.g., ghidra/radare/ida). I expect this to be a core format for binary analysis in the future.

SARIF-SASP-Introduction-figures

The Static Analysis Results Interchange Format (SARIF) is a standardized format for the output of static analysis tools, which are used to evaluate source or binary for things like vulnerabilities or dataflow. SARIF enables different analysis tools to produce results in a common format that can be easily understood, integrated, and acted upon by software development tools and systems. E.g. vscode, ghidra, radare2, and github all adopt a common standard for representing types of information.

SARIF describes: the analysis being ran and results from an analysis on an artifact. Results include description of artifacts related to a run of the tool where artifact is source code, binary file, and auxiliary data files. Results also include the invocation or how the tool was run, including version, command line, any knobs/parameters. The idea being you can reconstruct where output data came from foe things that depend on parameters on specific input. Results themselves are captured via "rules" where it is some type of analysis, one could imagine a single rule identifier for all of capa, but that wouldn't be very useful. For each rule/type of information, there is a single message for the finding as well as a property bag which you can shove anything into.

This PR adds a new script that takes in a CAPA output file (~7.0) and converts the json to SARIF (a JSON with additional schema). This is a clean start from a previous PR to clean up branch history from embedding this as argument flags in capa directly. Potentially if this feature gets enough usage and is stable enough, adding a specific renderer is desired, but that may prefer doing natively instead of 3rd party deps.

This includes additional features for Radare specific and Ghidra specific current requirements. I expect both of these to get fixed over time.

Steps to test functionality

  1. python3 -m venv venv
  2. source venv/bin/activate
  3. python3 -m pip install -e .[dev]
  4. git submodule init
  5. git submodule update
  6. capa --json tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_ > capa_result.json
  7. python3 -m json.tool capa_result.json // test json compliance
  8. python3 scripts/capa2sarif.py capa_result.json -r > capa_radare.sarif
  9. python3 -m json.tool capa_radare.sarif // test json compliance
  10. r2 tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_
  11. > sarif -i capa_radare.sarif
  12. > sarif -l

In ghidra, similar but -g instead of -r. Enable SARIF extension from install extensions. Sarif > Read File > capa_ghidra.sarif

Interactive table spawns image

Checklist

ReversingWithMe commented 1 month ago

Clean up from this PR https://github.com/mandiant/capa/pull/2036

williballenthin commented 1 month ago

recommend also adding a trivial test to test_scripts.py to show that this script can generate output without hitting exceptions, which we can then verify in CI.

ReversingWithMe commented 1 month ago

These are reasonable will work on adding today, thanks!

ReversingWithMe commented 1 month ago

This should address the above suggestions, thanks again! I am not sure on the test, but using an existing result document seems to be a good testcase (granted this would NOT catch breaking changes if JSON changes over time).

I am not sure a good way to take input file, run capa, run script after currently, I think this current approach is wrong though.

williballenthin commented 1 month ago

i think you could use any of the json files in capa-testfiles/rd/ as the input. We'll update those if the format ever changes. No need to invoke capa in the test to generate the json.

williballenthin commented 1 month ago

please resolve merge conflicts and then i'll merge!

williballenthin commented 4 weeks ago

thank you @ReversingWithMe!