zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.3k stars 427 forks source link

Encoder mishandles 16-bit address truncation behavior #471

Closed mappzor closed 5 months ago

mappzor commented 6 months ago

Let's consider two instructions in 32-bit mode:

00 3d 00 ff ff ff       add byte ptr ds:[0xFFFFFF00], bh
67 00 3e 00 ff          add byte ptr ds:[0x0000FF00], bh

Attempting to re-encode both of them results in 67 00 3e 00 ff add byte ptr ds:[0x0000FF00], bh. This presents a unique problem as it's not just a matter of buggy logic. In both cases disassembled instructions have matching operands (fuzzer couldn't catch this) with full 64-bit displacement 0xFFFFFFFFFFFFFF00. The only difference is effective address size (attribute of the whole instruction not just a single operand), so it's all about interpreting the operand correctly.

I see two ways of fixing this:

  1. require 16-bit absolute addresses to be truncated to 16-bits by the user - I don't like this, it's very clunky and goes against the rule that displacements are uniformly signed, it's also just a terrible API
  2. use address size hints - ZydisEncoderDecodedInstructionToEncoderRequest always sets those hints anyway based on effective address size, currently they are ignored unless instruction scales with address size (hidden scaling). This hint can be used in additional context to resolve disambiguity between 16 and 32-bit absolute addresses. I think it's intuitive that address size hint covers all issues regarding address size. It's also a very natural place to document this properly. Lack of hint would indicate that native size is preferred (16 in 16-bit modes, 32 bits otherwise)

Currently I'm exploring 2nd option as it seems the most viable. I'm also considering to get rid of another clunky 16-bit behavior. Consider this in 16-bit mode:

67 3A 05 00 1E 57 1A    cmp al, byte ptr ds:[0x1E00]

Displacement is 0x1A571E00 after disassembling but it gets truncated to 16 bits inside ZydisCalcAbsoluteAddress during formatting. Currently encoder ignores everything above 16 bits to re-encode this properly. I think encoder shouldn't accept this and it would be better to make ZydisEncoderDecodedInstructionToEncoderRequest responsible for sanitization.

tremalrik commented 6 months ago

These examples indicate that it's not just the encoder that's mishandling 16-bit address truncation, but the formatter as well.

While displacements can always be treated as signed in x86, address truncation is unsigned. Also - and this is the part that Zydis doesn't appear to handle correctly - the 67h prefix overrides not just address calculation width but address truncation width as well.

As a result:

mappzor commented 6 months ago

@tremalrik I've addressed this already in #472 :)

Separate PR will soon address rest of the issues.