Open tathanhdinh opened 6 years ago
I don't remember why :-( I think the basic issue is that we don't have a good way of managing precision-related issues across the semantics, and this comes up in a number of ways.
For example:
kPrecisionDouble
into the FPUControlWord
.What this means is that although you might load in a valid mxcsr (e.g. from an fxrstor
or something), there's no expectation that it will stay valid over time.
Given that we don't actually track things on a per-operation basis, what that TODO
is asking of us is to say: "we need a way to ask the current hardware (on which the bitcode is running) if it thinks there's been a loss of precision", and then set the pe
flag accordingly.
You might be able to use __remill_fpu_exception_test_and_clear
to get at the info, but not sure.
I added this semantic in https://github.com/trailofbits/remill/commit/03fdb2d8c9363f36abd30440f430e860c279cdc3#diff-ad103422aab8c2d7d4f4aa2999e5f4f2
I also didn't keep notes to remember my rationale for this, but I think I based it on the semantic for FNSTCW
https://github.com/trailofbits/remill/blob/8e0a28ebae9460bc71302ab9dc4b8fe4fbbf32a0/remill/Arch/X86/Semantics/X87.cpp#L1276
I agree with Peter that the precision flag is set to zero not because that's how the STMXCSR
actually works, but because I didn't know how to query the current hardware FPU for its precision control flag.
I haven't had time, but we ought to revisit the FMA issue, and implement their semantics using std::fma
to preserve the "infinite" precision.
Hi @pgoodman and @mike-myers-tob Many thanks for your responses to the details of remill. Unfortunately, I still don't get what you mean, sorry to bother you.
First, about precision flag and floating-point operators. Correct me if I'm wrong, the mxcsr
register is used to control the "packed" floating-point operations on XMM registers, not on X87 FPU registers. So what we set on the precision flag (of mxcsr
) will not have any influence on X87 floating point operators? (moreover, XMMi
and STi
are not aliased).
The precision of SSE (packed or scalar) floating-point operators depends on how the floating-point values are represented, not on the precision flag? For example, the instruction roundsd
doesn't even care on the flag.
Second, about calling fegetround
in semantics function. Why is the rounding mode of the running host used to set one of the emulated host? I do not understand what you did.
It would make sense for me if this function is called in the test to initialize the rounding mode (of the emulated code).
Third, which is not directly relevant with above, about tests for lsmxcr
https://github.com/trailofbits/remill/blob/93cf321cd019c1104d5239cd5464c028fbdcafc7/tests/X86/SSE/MXCSR.S#L17-L21
and ldmxcr
https://github.com/trailofbits/remill/blob/93cf321cd019c1104d5239cd5464c028fbdcafc7/tests/X86/SSE/MXCSR.S#L33-L37
They might not help much since they do not reflect any influence of this register on floating-point operators, e.g. I've tried apply just csr.flat = Read(src)
for ldmxcsr
and Write(dst, csr.flat)
for stmxcsr
and they can also pass the tests.
You are right about mxcsr. Though one thing to consider is that once we lift up to bitcode, and say, compile back down to machine code, we have no real idea what hardware units are used to perform a given floating point operation. Aso, I think things like glibc try to keep the mxcsr and fpu control words in sync.
We use host rounding mode because it is assumed that if you are running the bitcode (e.g. lli, compiled), then modifications to the rounding mode are reflected down to the host machine. Ultimately to perform the operations you depend on the host, so trying to synchronize the host seemed natural. For example, in my Remill-based DBI, I have the following:
// Called by assembly in `__vmill_execute_async`.
void __vmill_execute(Task *task, LiftedFunction *lifted_func) {
const auto memory = task->memory;
const auto pc = task->pc;
auto native_rounding = std::fegetround();
std::fesetround(task->fpu_rounding_mode);
lifted_func(task->state, pc, memory); // Calls into lifted code.
task->fpu_rounding_mode = std::fegetround();
std::fesetround(native_rounding);
task->last_pc = pc;
const auto &fault = task->mem_access_fault;
if (kMemoryAccessNoFault == fault.kind) {
switch (task->location) {
case kTaskStoppedAtError:
case kTaskStoppedBeforeUnhandledHyperCall:
task->status = kTaskStatusError;
break;
default:
task->status = kTaskStatusRunnable;
break;
}
} else {
LogFault(LOG(ERROR), task);
LogRegisterState(LOG(ERROR), task->state);
memory->LogMaps(LOG(ERROR));
task->status = kTaskStatusError;
}
}
As to your third point, yeah, those tests are indeed not very useful....
Overall you are not wrong about anything, and are really quite right. What is there now is the result of compromises/choices made. We are open to improvements, though :-D
Hello all,
I'am studying how remill lifts the
stmxcsr
instruction, but there are some details that I still do not understand.The precision flag is sets to zero, but why? I cannot see this flag is reset to zero in debugging
stmxcsr
(I modify this flag to 1 then step the instruction)Why is the host function
fegetround
called? It is kind of weird for me because the target machine may not have the same rounding method with the host? Should a host function be called in the test only, not in the instruction semantics?IMHO, the semantics of the lifting instruction would be simply
Write(dst, csr.flat)
?Many thanks for any comment.