zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.3k stars 427 forks source link

Reworked absolute address handling (Fixes #471) #473

Closed mappzor closed 5 months ago

mappzor commented 6 months ago

This PR fixes all known issues around treating signed displacements as absolute addresses. Address size hint is now used for disambiguation where needed (rare cases that essentially force address override prefix). This also lead to a simplification of the fuzzer as verification of memory operands was always quite messy due to multiple issues that it had to deal with (signedness, 16-bit truncation, MIB, etc.). Now it's finally replaced with simpler and more robust code.

Fixes #471

mappzor commented 6 months ago

In the latest set of changes I've finally dealt with the source of all address-related problems. It's this piece of code from ZydisEncoderDecodedInstructionToEncoderRequest:

if (dec_op->encoding == ZYDIS_OPERAND_ENCODING_DISP16_32_64)
{
    ZydisCalcAbsoluteAddress(instruction, dec_op, 0,
        (ZyanU64 *)&enc_op->mem.displacement);
}
else
{
    enc_op->mem.displacement = dec_op->mem.disp.has_displacement ?
        dec_op->mem.disp.value : 0;
}

It opened the gates of hell because it introduces special treatment for absolute addresses coming from moffs variants of mov instructions. So mov [address], eax could result in different encoder requests depending on which variant of mov it was ("normal" vs moffs). Dealing with this on the other end of the story (the encoder itself) resulted in convoluted processing of displacement-only and moffs operands. At some later point 16-bit workarounds were added to the mix.

Now things are done The Right Way :tm:. All displacements are the same and they are handled just like in other parts of Zydis (decoder and utils like ZydisCalcAbsoluteAddress) i.e. they are 64-bit signed and interpreted in the presence of external size information. For encoder this information is now the address size hint.

This took way more time and effort than I've originally anticipated for something that initially seemed like a minor oversight but I think the changes are worth it. New behaviors are truly uniform and shield the user from all the ugly quirks of mov. Handling of none hint should be more aligned with the expectations of a typical user as well (when encoding from scratch).