Closed thixotropist closed 3 weeks ago
Picking the right goals to test pcode semantics against may be a bigger issue.
For example, the riscv half-precision floating point extension zfa
support in Ghidra would be a tough test case if full Ghidra emulation was required.
Let's start RISCV semantics testing with floating point conversions, since @jobermayr has done some nice cleanup there.
Start by generating some unit tests exercised by a RISCV-64 gtest
framework executing within a qemu-riscv-static
emulator.
For example, compile, RE, and execute under qemu the following:
float fcvt_s_w(int* i) {
return (float)*i;
}
float fcvt_s_wu(unsigned int* i) {
return (float)*i;
}
int fmv_x_w(float* x) {
int val;
float src = *x;
__asm__ __volatile__ (
"fmv.x.w %0, %1" \
: "=r" (val) \
: "f" (src));
return val;
}
...
int i = 1;
unsigned int iu = 1;
unsigned int iumax = 0xffffffff;
EXPECT_NEAR(1.0, fcvt_s_w(&i), 1.0e-5) << "fcvt.s.w failure";
EXPECT_NEAR(1.0, fcvt_s_wu(&iu), 1.0e-5) << "fcvt.s.wu failure";
EXPECT_NEAR(4294967295.0, fcvt_s_wu(&iumax), 1.0) << "fcvt.s.wu failure on large unsigned ints";
...
float x = 1.0;
int x_as_int = 0x3f800000;
EXPECT_EQ(x_as_int, fmv_x_w(&x)) << "fmv.x.w failure";
EXPECT_NEAR(1.0, fmv_w_x(&x_as_int), 1.0e-5) << "fmv.w.x failure";
These unit tests all pass under qemu-riscv-static
.
Ghidra's current (11.2-DEV) decompilation gives:
float fcvt_s_w(int *param_1)
{
return (float)*param_1; // looks good
}
float fcvt_s_wu(uint *param_1)
{
return (float)ZEXT416(*param_1); // The ZEXT416 pcode op is incorrect here
}
long fmv_x_w(int param_1) // param1 is the wrong type, and any signature fix will likely force
// an unwanted implicit cast
{
return (long)param_1;
}
Let's start RISCV semantics testing with floating point conversions, since @jobermayr has done some nice cleanup there.
Sorry. I don't want to get all the credit. It was @Sleigh-InSPECtor. I just got a build error while trying to add https://github.com/NationalSecurityAgency/ghidra/pull/6492 to https://github.com/jobermayr/ghidra-staging ...
@jobermayr: Thanks for the clarification. What do you think the Ghidra emulator should do with the following code:
double xdNan = std::numeric_limits<double>::quiet_NaN();
fcvt_s_d(&xdNan);
Qemu properly tests for NaN inputs and throws an exception, which you can test and catch with
EXPECT_DEATH(fcvt_s_d(&xdNan), "") << "fcvt.s.d failure on NaN conversion";
I'm not saying Ghidra's emulator should be as faithful to the hardware spec - just that knowing where to draw the line would be nice. Divide by zero? Unaligned reads and writes?
A provisional design for pcode testing specific to ISA extensions:
The code under riscv64/generated/emulator_tests shows a sample first cut at this, taking on the 16 bit floating point RISCV64 extension. The qemu-static and qemu-system results look mostly good. Getting the same level of fidelity with Ghidra would take a massive amount of work, as it has no built-in support for 16 bit floats.
Many ISA extension instructions require custom pcodes in their processor's
*.sinc
files. The latest round of commits to Ghidra'sisa_ext
branch includes many corrections to simple errors. What kind of import testing would help reduce those errors?This is complicated by the lack of reference models for ideal pcode generation. The current method is subjective - minimize the confusion a Ghidra user will experience when examining functions in the decompiler window.
The initial model for generating pcode is to generate pcode functions that look like the assembly instruction, with '_' replacing '.', and with function arguments in the same order as binutils presents them in a binary objdump. Many vector pcode instances currently have a variable number of arguments to allow for mask variants. This is not how riscv vector intrinsics handles masked operations.
This issue is intended to collect options for improved pcode examples to test against, along with estimates of the implementation complexity.
Options include:
Option 3 is likely much too ambitious for now. Option 2 is ambitious and likely premature, but we can take small steps in that direction. Can we transform pcode ops into notional function calls, so that we can propagate parameter type and vector context information as if they were first-class function calls?