lifting-bits / remill

Library for lifting machine code to LLVM bitcode
Apache License 2.0
1.29k stars 145 forks source link

BSWAP 16 fails on QEMU #422

Closed RobertHenry6bev closed 4 years ago

RobertHenry6bev commented 4 years ago

In my tests bswap16 is failing on QEMU. There's a code smell in the remill test jig, but I'm not 100% sure.

Assuming I'm looking at the right place for the progenitor of this test, I see https://github.com/lifting-bits/remill/blob/86a6424d58cca6504a93bf7dc4363f14be9e216f/tests/X86/BITBYTE/BSWAP.S#L29 This uses a .byte sequence 0x66, 0x0f, 0xc8 because, presumably, gas doesn't know about the opcode/operand combination for bswap16.

Consulting a definition of bswap https://www.felixcloutier.com/x86/bswap it says that to byte-swap a 16 bit register, use XCHG https://www.felixcloutier.com/x86/xchg but there's also text that says bswap on a 16 bit register is undefined.

The sequence 0x66, 0x0f, 0x08 looks like prefix=0x66 + BSWAP32 which may not be what you want; I'm pretty rusty on the semantics of the prefixes.

Here's a cut and paste from the QEMU output that might shed light on this. mnt/msft/remill/remill/tests/X86/Run.cpp:778: Failure Value of: false Actual: false Expected: true States did not match for BSWAPr16_1 with ARG1=0xaabbccdd and CF=1 PF=1 AF=1 ZF=1 SF=1 DF=1 OF=1 /mnt/msft/remill/remill/tests/X86/Run.cpp:784: Failure Expected equality of these values: lifted_state->gpr.rax.dword Which is: 2864381952 native_state->gpr.rax.dword Which is: 3721182122

here's a decimal->hex of those 2 numbers (gdb) p/x 2864381952 $1 = 0xaabb0000 (gdb) p/x 3721182122 $2 = 0xddccbbaa (gdb)

RobertHenry6bev commented 4 years ago

Commentary in https://github.com/lifting-bits/remill/pull/284 may be relevant

RobertHenry6bev commented 4 years ago

Sandeep Dasgupta writes me via email: That's amazing, Robert!

In fact, this is strange behavior that I have a hard time to derive sanely from their implementation at https://github.com/lifting-bits/remill/blob/master/remill/Arch/X86/Semantics/BITBYTE.cpp#L256

And this is why:

As you may see (from the above link) that the 16-bit swap zeroes out the 16-bit register. In fact, "zero" is the default value that McSema uses for any undefined values.

Pardon me if I am wrong: https://defuse.ca/online-x86-assembler.htm#disassembly2 says that

Disassembly: 0: 66 0f c8 bswap ax (I agree this is strange observation against the https://www.felixcloutier.com/x86/bswap)

However, If the above disassembly is true, then the lifted value (gdb) p/x 2864381952 $1 = 0xaabb0000

seems a correct behavior as it zeroes out the lower 16-bit.

Moreover, it seems the native state (which I believe is emitted by QEMU) native_state->gpr.rax.dword Which is: 3721182122

is just an implementation of the undefined (or implementation-defined ) behavior chosen by QEMU.

Regards, Sandeep Dasgupta Research Assistant, CS, UIUC Webpage: http://sdasgup3.web.engr.illinois.edu/ Github: https://github.com/sdasgup3

pgoodman commented 4 years ago

So this seems like a nice detection mechanism for QEMU :-P We could modify Run.cpp to detect QEMU using a bswap and disable that test. In some places we represent UB as 0 or 1 values, e.g. based on what we've observed on some machines, or based on results from sandpile.org. We could represent them as return values from __remill_undefined_N(). iirc that was my original intention.

artemdinaburg commented 4 years ago

According to the official intel manual BSWAP against 16-bit register is undefined:

Reverses the byte order of a 32-bit or 64-bit (destination) register. This instruction is provided for converting little- endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined.

Looking at QEMU's code, they follow the manual and only implement 32- or 64- bit BSWAP: https://github.com/qemu/qemu/blob/master/target/i386/translate.c#L7115-L7130

Since this is technically undefined, both implementation are 'right', but what McSema does is closer to what occurs on (current!) physical hardware.

pgoodman commented 4 years ago

Alright, I'm going to close this as I don't think it's a bug in Remill.