Open thixotropist opened 1 month ago
Start with a subjective Ghidra integration test, where we have a training case study analyzing a misbehaving RISC-V network appliance built with the latest GCC toolchain and microarchitectures. A successful Ghidra feature will make that case study easier to follow, for a modest development effort.
For example, adding Sleigh definitions and user pcode operations for vector instructions allows the disassembler and decompiler to extend static analysis to more functions. This can give the user a much clearer perspective on internal operations.
Those vector instructions are also likely to confuse the user's perspective, as they often replace multiple simple scalar operations with fewer but more complex vector instructions. Can Ghidra do something to help reduce that confusion, or do we rely on user training aids to recognize common vector instruction sequences?
We will start with a RISC-V network application built on a DPDK framework, believed to be similar to the dpdk-ip_pipeline
example. The inputs to Ghidra will be reference builds of dpdk-ip_pipeline
with and without symbols stripped, plus a realtime snapshot of dpdk-ip_pipeline
after initialization within a RISC-V emulation environment. That probably needs a separate project repo.
This project seeks a set of Ghidra import regression tests to validate sensible behavior after importing executable binaries into new versions of Ghidra. It's morphed somewhat into generating newer executable binaries that Ghidra can not currently import sensibly, but might in some future Ghidra release.
Some of those newer binaries might never be sensibly imported into Ghidra's more advanced features like decompilation into compilable C and dynamic analysis/emulation.
Two examples from the RISC-V processor space:
Let's try recasting this project to explore feasible Ghidra integration tests. Assume we have two RISCV-64 executables built from DPDK and Whisper.cpp sources and compiled for the Sophgo 2380 processor. What features - and feature tests - would we need to add to Ghidra to support static analysis? If we need dynamic analysis, do we look to Ghidra for that or do we rely on a RISCV-64 VM or Qemu?