Closed ReversingWithMe closed 4 weeks ago
Clean up from this PR https://github.com/mandiant/capa/pull/2036
recommend also adding a trivial test to test_scripts.py
to show that this script can generate output without hitting exceptions, which we can then verify in CI.
These are reasonable will work on adding today, thanks!
This should address the above suggestions, thanks again! I am not sure on the test, but using an existing result document seems to be a good testcase (granted this would NOT catch breaking changes if JSON changes over time).
I am not sure a good way to take input file, run capa, run script after currently, I think this current approach is wrong though.
i think you could use any of the json files in capa-testfiles/rd/
as the input. We'll update those if the format ever changes. No need to invoke capa in the test to generate the json.
please resolve merge conflicts and then i'll merge!
thank you @ReversingWithMe!
SARIF gets you navigation for binary beacons from capa in any tool that supports SARIF(e.g., ghidra/radare/ida). I expect this to be a core format for binary analysis in the future.
The Static Analysis Results Interchange Format (SARIF) is a standardized format for the output of static analysis tools, which are used to evaluate source or binary for things like vulnerabilities or dataflow. SARIF enables different analysis tools to produce results in a common format that can be easily understood, integrated, and acted upon by software development tools and systems. E.g. vscode, ghidra, radare2, and github all adopt a common standard for representing types of information.
SARIF describes: the analysis being ran and results from an analysis on an artifact. Results include description of artifacts related to a run of the tool where artifact is source code, binary file, and auxiliary data files. Results also include the invocation or how the tool was run, including version, command line, any knobs/parameters. The idea being you can reconstruct where output data came from foe things that depend on parameters on specific input. Results themselves are captured via "rules" where it is some type of analysis, one could imagine a single rule identifier for all of capa, but that wouldn't be very useful. For each rule/type of information, there is a single message for the finding as well as a property bag which you can shove anything into.
This PR adds a new script that takes in a CAPA output file (~7.0) and converts the json to SARIF (a JSON with additional schema). This is a clean start from a previous PR to clean up branch history from embedding this as argument flags in capa directly. Potentially if this feature gets enough usage and is stable enough, adding a specific renderer is desired, but that may prefer doing natively instead of 3rd party deps.
This includes additional features for Radare specific and Ghidra specific current requirements. I expect both of these to get fixed over time.
Steps to test functionality
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -e .[dev]
git submodule init
git submodule update
capa --json tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_ > capa_result.json
python3 -m json.tool capa_result.json
// test json compliancepython3 scripts/capa2sarif.py capa_result.json -r > capa_radare.sarif
python3 -m json.tool capa_radare.sarif
// test json compliancer2 tests/data/5d7c34b6854d48d3da4f96b71550a221.exe_
> sarif -i capa_radare.sarif
> sarif -l
In ghidra, similar but -g instead of -r. Enable SARIF extension from install extensions. Sarif > Read File > capa_ghidra.sarif
Interactive table spawns![image](https://github.com/mandiant/capa/assets/22553347/3c217e2c-da2f-43a7-823e-b2d51c5c4da0)
Checklist