zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.4k stars 434 forks source link

Incorrect operand size with mov instruction #468

Closed NaC-L closed 9 months ago

NaC-L commented 9 months ago
int main()
{
ZyanU8 data[] =
{
   0x48, 0xC7, 0x00, 0x20, 0x00, 0x00, 0x00, // mov qword ptr [rax], 0x20 expected: size: 64 op2 size: 64
   0xC7, 0x00, 0x20, 0x00, 0x00, 0x00, // mov dword ptr [rax], 0x20 expected: size: 32 op2 size: 32
   0x66, 0xC7, 0x00, 0x20, 0x00, // mov word ptr [rax], 0x20 expected: size: 16 op2 size: 16
   0xC6, 0x00, 0x20, // mov byte ptr [rax], 0x20 expected: size: 8 op2 size: 8

   0x48, 0x89, 0x00, // mov [rax], rax expected:  size: 64 op2 size: 64
   0x89, 0x00, // mov [rax], eax op1 expected:  size: 32 op2 size: 32
};

// The runtime address (instruction pointer) was chosen arbitrarily here in order to better 
// visualize relative addressing. In your actual program, set this to e.g. the memory address 
// that the code being disassembled was read from. 
ZyanU64 runtime_address = 0x007FFFFFFF400000;

// Loop over the instructions in our buffer. 
ZyanUSize offset = 0;
ZydisDisassembledInstruction instruction;
while (ZYAN_SUCCESS(ZydisDisassembleIntel(
    /* machine_mode:    */ ZYDIS_MACHINE_MODE_LONG_64,
    /* runtime_address: */ runtime_address,
    /* buffer:          */ data + offset,
    /* length:          */ sizeof(data) - offset,
    /* instruction:     */ &instruction
))) {
    printf("%016" PRIX64 "  %s op1 size: %d op2 size: %d\n", runtime_address, instruction.text, instruction.operands[0].size, instruction.operands[1].size);
    offset += instruction.info.length;
    runtime_address += instruction.info.length;
}
}

output:

007FFFFFFF400000  mov qword ptr [rax], 0x20 op1 size: 64 op2 size: 32
007FFFFFFF400007  mov dword ptr [rax], 0x20 op1 size: 32 op2 size: 32
007FFFFFFF40000D  mov word ptr [rax], 0x20 op1 size: 16 op2 size: 16
007FFFFFFF400012  mov byte ptr [rax], 0x20 op1 size: 8 op2 size: 8
007FFFFFFF400015  mov [rax], rax op1 size: 64 op2 size: 64
007FFFFFFF400018  mov [rax], eax op1 size: 32 op2 size: 32

it should've been

007FFFFFFF400000  mov qword ptr [rax], 0x20 op1 size: 64 op2 size: 64

like the rest of the other cases with dword, word and byte.

mappzor commented 9 months ago

No, it should be as it is.

48 C7 00 20 00 00 00
:  :  :  :..IMM
:  :  :..MODRM
:  :..OPCODE
:..REX

For immediate operands size refers to physical size of the operand which is at most 32 bits. In this case encoding is SIMM16_32_32 which is 32 bits in 64-bit mode with sign extension.

NaC-L commented 9 months ago

Thanks for explanation, is there a way to detect if the operand is in 64-bit mode?

mov dword ptr [rax], 0x20 and mov word ptr [rax], 0x20 also appears to has ZYDIS_OPERAND_ENCODING_SIMM16_32_32 encoding

however mov byte ptr [rax], 0x20 has ZYDIS_OPERAND_ENCODING_SIMM8

mappzor commented 9 months ago

There are around 30 different variants of mov. Also many other instructions from base ISA have specialized variants for 8-bit immediates.

I don't know what problem you're trying to solve but if you are interested in size of value that gets read/written by a memory operand you should rely on size field for that memory operand. size of immediates is usable only when you need low-level information about physical size of the value before any zero/sign extensions.