Near branches prefixed with 0x66 whose instruction length differs between AMD64 and EMT64

hlide commented 7 years ago

For instance:

{
ICLASS    : JMP
CPL       : 3
CATEGORY  : UNCOND_BR
EXTENSION : BASE
ISA_SET   : I86
ATTRIBUTES: MPX_PREFIX_ABLE
PATTERN   : 0xE9 not64 BRDISPz()
OPERANDS  : RELBR:r:z REG0=XED_REG_EIP:rw:SUPP
PATTERN   : 0xE9 mode64 FORCE64() BRDISP32()
OPERANDS  : RELBR:r:d REG0=XED_REG_RIP:rw:SUPP
}

I didn't test it, but I don't see clear indication in data files that case is handled differently between AMD64 and EMT64.

I believe AMD64 and EMT64 don't handle the near JMP the same way:

AMD document says 16-bit displacement (prefix 0x66 not ignored), so total length is 4 bytes.
Intel document says 32-bit displacement (prefix 0x66 ignored), so total length is 6 bytes.

Also, www.sandpile.org says so;

.Df64 | defaults to O64 in PM64; 66h results in O16 in AMD64 but is ignored in EM64T (near branches)

The issue applies for every instruction branching with a word displacement.

markcharney commented 7 years ago

yes, this is one of the unfortunate differences between the products. A fix is certainly possible. Just need another mode bit to select the right instruction definition. I'll put it on the list... Are you actually seeing (non-malware) code that relies on this?

hlide commented 7 years ago

From https://github.com/xoreaxeaxeax/sandsifter:

The tool discovered innumerable bugs in disassemblers, the most interesting of which is a bug shared by nearly all disassemblers. Most disassemblers will parse certain jmp (e9) and call (e8) instructions incorrectly if they are prefixed with an operand size override prefix (66) in a 64 bit executable. In particular, IDA, QEMU, gdb, objdump, valgrind, Visual Studio, and Capstone were all observed to parse this instruction differently than it actually executes. On Intel processors executing in 64 bit mode, the 66 override prefix appears to be ignored, and the instruction consumes a 4 byte operand, as it does without the prefix. Most disassemblers misinterpret the instruction to consume only a 2 byte operand instead (those that assume a 4 byte operand still miscalculate the jump target, assuming it is truncated to 16 bits). This difference in instruction lengths between the disassembled version and the version actually executed opens opportunities for malicious software. By embedding an opcode for a long instruction in the last two bytes of the physical instruction, the physical instruction stream can hide malicious code in the following instruction. Disassemblers and emulators, thrown off by the misparsing of the initial instruction, miss this malicious code in the subsequent instructions (figure 6).
0500000000 add $0x0,%eax
0500000000 add $0x0,%eax
48b8b811223344ffe090 movabs $0x90e0ff44332211b8,%rax
48b8b811223344ffe090 movabs $0x90e0ff44332211b8,%rax
48b8b811223344ffe090 movabs $0x90e0ff44332211b8,%rax
48b8b811223344ffe090 movabs $0x90e0ff44332211b8,%rax
Figure 6. Masking malicious code from objdump and GDB. The opening jmp
is misparsed as a 4 byte instruction, throwing off the parsing of the
subsequent instructions. A malicious “jmp payload” instruction (for the
example, payload is 0x11223344) is embedded in the “movabs” instructions.
While the disassembler sees “movabs”, the processor will execute the
embedded “jmp payload” instead.
As a demonstration of the impact on emulators, we created a program that runs as a benign process in QEMU, but executes a malicious function when run on baremetal (figure 7). The same program, analyzed in IDA, objdump, Capstone, or Visual Studio, will also appear to not execute the malicious code.
// trampoline
__asm__ ("\
.globl trampoline_return \n\
mov $trampoline_return, %rax \n\
jmp *%rax \n\
");
// attack
__asm__ (".byte 0x66,0xe9,0x00,0x00,0x00,0x00");
if (1) {
printf("malicious\n");
}
else {
__asm__ __volatile__ ("trampoline_return:");
printf("benign\n");
}
Figure 7. A malicious program that prints “benign” when run under QEMU,
but “malicious” when run on baremetal. The assembly trampoline at the top
is copied into low memory, as a target for the mis-emulated jmp instruction,
while the jump on baremetal simply falls through to the next instruction.
These types of emulation failures (of which we found many) have important security consequences in terms of antivirus and sandboxing techniques. If an analysis engine cannot faithfully emulate the underlying architecture, it is easy for malicious softer to mask its true behavior. The confusion in these instructions is likely caused by differences in AMD and Intel processors; AMD processors obey the override prefix, only fetching a two byte operand. However, due to AMD’s small market share, tools would be better to follow Intel’s implementation. QEMU misinterprets the instruction, even when emulating an Intel processor.

markcharney commented 7 years ago

yeah, saw that. i can add a mode and it might be good for a reference but i fear most people wont know to use that mode in the first place.

intelxed / xed

Near branches prefixed with 0x66 whose instruction length differs between AMD64 and EMT64 #64