Closed mappzor closed 5 months ago
These examples indicate that it's not just the encoder that's mishandling 16-bit address truncation, but the formatter as well.
While displacements can always be treated as signed in x86, address truncation is unsigned. Also - and this is the part that Zydis doesn't appear to handle correctly - the 67h
prefix overrides not just address calculation width but address truncation width as well.
As a result:
67 00 3e 00 ff
as add byte ptr ds:[0x0000FF00],bh
is correct - in this case, the 67h
prefix overrides truncation width from 32-bit down to 16-bit, and the resulting memory access goes to ds:[0x0000FF00]
, not ds:[0xFFFFFF00]
. As a result, re-encoding 00 3d 00 ff ff ff
into 67 00 3e 00 ff
is not correct, but it would have been correct to re-encode 00 3d 00 ff 00 00
into 67 00 3e 00 ff
.67 3a 05 00 1e 57 1a
as cmp al, byte ptr ds:[0x1E00]
is NOT correct! The 67h
prefix will, in this case, override address truncation width from 16-bit to 32-bit, so the memory access will go to address ds:[0x1a571e00]
, not ds:[0x1e00]
. As such, the correct decoding is cmp al, byte ptr ds:[0x1a571e00]
- which is what e.g. XED outputs. (Attempting to access address ds:[0x1a571e00]
in 16-bit mode will normally cause a #GP, due to the data segment size normally being 64K in 16-bit mode, however it is perfectly possible to set up a big 32-bit data segment for unreal-mode or 16-bit protected-mode, in which case such an instruction will execute just fine in 16-bit mode.)@tremalrik I've addressed this already in #472 :)
Separate PR will soon address rest of the issues.
Let's consider two instructions in 32-bit mode:
Attempting to re-encode both of them results in
67 00 3e 00 ff add byte ptr ds:[0x0000FF00], bh
. This presents a unique problem as it's not just a matter of buggy logic. In both cases disassembled instructions have matching operands (fuzzer couldn't catch this) with full 64-bit displacement0xFFFFFFFFFFFFFF00
. The only difference is effective address size (attribute of the whole instruction not just a single operand), so it's all about interpreting the operand correctly.I see two ways of fixing this:
ZydisEncoderDecodedInstructionToEncoderRequest
always sets those hints anyway based on effective address size, currently they are ignored unless instruction scales with address size (hidden scaling). This hint can be used in additional context to resolve disambiguity between 16 and 32-bit absolute addresses. I think it's intuitive that address size hint covers all issues regarding address size. It's also a very natural place to document this properly. Lack of hint would indicate that native size is preferred (16 in 16-bit modes, 32 bits otherwise)Currently I'm exploring 2nd option as it seems the most viable. I'm also considering to get rid of another clunky 16-bit behavior. Consider this in 16-bit mode:
Displacement is
0x1A571E00
after disassembling but it gets truncated to 16 bits insideZydisCalcAbsoluteAddress
during formatting. Currently encoder ignores everything above 16 bits to re-encode this properly. I think encoder shouldn't accept this and it would be better to makeZydisEncoderDecodedInstructionToEncoderRequest
responsible for sanitization.