thixotropist / ghidra_import_tests

Experimental framework for testing Ghidra binary import support
1 stars 0 forks source link

Recast project goals and scope #25

Open thixotropist opened 1 month ago

thixotropist commented 1 month ago

This project seeks a set of Ghidra import regression tests to validate sensible behavior after importing executable binaries into new versions of Ghidra. It's morphed somewhat into generating newer executable binaries that Ghidra can not currently import sensibly, but might in some future Ghidra release.

Some of those newer binaries might never be sensibly imported into Ghidra's more advanced features like decompilation into compilable C and dynamic analysis/emulation.

Two examples from the RISC-V processor space:

Let's try recasting this project to explore feasible Ghidra integration tests. Assume we have two RISCV-64 executables built from DPDK and Whisper.cpp sources and compiled for the Sophgo 2380 processor. What features - and feature tests - would we need to add to Ghidra to support static analysis? If we need dynamic analysis, do we look to Ghidra for that or do we rely on a RISCV-64 VM or Qemu?

thixotropist commented 1 month ago

Start with a subjective Ghidra integration test, where we have a training case study analyzing a misbehaving RISC-V network appliance built with the latest GCC toolchain and microarchitectures. A successful Ghidra feature will make that case study easier to follow, for a modest development effort.

For example, adding Sleigh definitions and user pcode operations for vector instructions allows the disassembler and decompiler to extend static analysis to more functions. This can give the user a much clearer perspective on internal operations.

Those vector instructions are also likely to confuse the user's perspective, as they often replace multiple simple scalar operations with fewer but more complex vector instructions. Can Ghidra do something to help reduce that confusion, or do we rely on user training aids to recognize common vector instruction sequences?

We will start with a RISC-V network application built on a DPDK framework, believed to be similar to the dpdk-ip_pipeline example. The inputs to Ghidra will be reference builds of dpdk-ip_pipeline with and without symbols stripped, plus a realtime snapshot of dpdk-ip_pipeline after initialization within a RISC-V emulation environment. That probably needs a separate project repo.