MOV16ms uses operand size override prefix

kadircet commented 6 years ago


Bugzilla Link	34478
Resolution	FIXED
Resolved on	Nov 29, 2017 09:08
Version	4.0
OS	Linux
CC	@topperc,@jyknight,@RKSimon

Extended Description

When assembling an instruction like movw %fs, (%rsi)

LLVM encodes with:

66 8c 26 mov %fs,(%rsi)

which adds operand size override prefix whereas gcc doesn't append 66 and encodes the instruction as:

8c 26 mov %fs,(%rsi)

Intel defines the behavior of the instruction on page 694 of Instruction Set Reference, July 2017 and it doesn't provide detailed information for that case, but I feel like using only the least 16 bits of an address might cause problems in 64 bit mode.

Do we know if it is a bug or X86InstrSystem.td contains OpSize16 for MOV16ms on a specific purpose?

llvmbot commented 6 years ago

Landed in r318797.

llvmbot commented 6 years ago

https://reviews.llvm.org/D39847

jyknight commented 6 years ago

Nirav fixed the inverse of this (mov from register to segment-reg) a while back. Probably the same fix should apply here.

llvmbot commented 6 years ago

@Kadir: see the bottom of this for a response to your comments about "using the least 16 bits of addresses". I think you're not understanding how segment regs work.

But there is an issue here. clang's assembler could / should save a byte. I haven't tested what it does do in all these cases, just what it should. :P

NASM and YASM both assemble mov [rsi], fs to 8C 26, without an operand-size prefix. This is what we should do for memory sources / destinations.

It seems (from testing with a register destination) that the default operand-size is 32-bit, but the memory-destination for is always a 16-bit store regardless of prefixes. We could force users to write movl %fs, (%rsi), but nobody wants that. NASM/YASM won't even assemble mov dword [rsi], fs in 64-bit mode. (And unlike NASM, YASM doesn't normally optimize away extra prefixes. e.g. YASM assembles mov rax, 1 to the 7-byte REX mov r/m64, imm32 form, while NASM optimizes it to 5-byte mov r32, imm32.)

But like I said, mov %fs, (%rsi) should be assembled without an operand-size prefix. (It's an otherwise-harmless waste of space). Current binutils objdump -d disassembles with no operand-size suffix (mov, not movw or movl) in AT&T syntax, but as 8c 26 mov WORD PTR [rsi],fs in -Mintel syntax.

Note that the REX.W + 8C /r MOV r/m64,Sreg memory-destination form described by Intel is still a 16-bit store. Prefixes (66 or REX.W) have no effect; nothing I did in 32 or 64-bit mode ever got it to modify memory outside of the 2 bytes at the destination.

Same goes for memory source operands, AFAICT. Omit the operand-size prefix because other assemblers do.

Register destinations are different: it matters there. With no prefixes, the register-destination form defaults to 32-bit operand-size (zero-extending, or apparently on old CPUs, undefined upper 2 bytes). So the assembler can just choose operand-size prefixes normally based on the GP register size.

Intel's insn set ref manual (x86 volume 2: https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf) has an entry for MOV with some relevant text in the Description section. I haven't checked AMD's manual.

Intel's Description section is confusing, but I eventually made sense of what they're saying about GP-reg (not memory) source/dest:

When operating in 32-bit mode and moving data between a segment register and a general-purpose register, the 32-bit IA-32 processors do not require the use of the 16-bit operand-size prefix (a byte with the value 66H) with this instruction, but most assemblers will insert it if the standard form of the instruction is used> (for example, MOV DS, AX). The processor will execute this instruction correctly, but it will usually require an extra clock. With most assemblers, using the instruction form MOV DS, EAX will avoid this unneeded 66H prefix

An extra clock? Doubt that applies anymore. IIRC some old CPUs took extra time decoding prefixes.

Anyway, what they're saying here is that users should always write mov %eax, %fs instead of mov %ax, %fs. And that assemblers are free to optimize by omitting the 66 prefix regardless of mode (.code16), e.g. for 32/64-bit mode effectively transforming it to mov %eax, %fs.

GAS doesn't normally optimize stuff like mov $1, %rax into mov $1, %eax, so it's this isn't an optimization that needs to be implemented if it's not already. Let users write mov %ax,%fs in .code16 or mov %eax,%fs otherwise.

When the processor executes the instruction with a 32-bit general-purpose register, it assumes that the 16 least-significant bits of the general-purpose register are the destination or source operand.

This is a confusing way to say that the segment reg value itself is read from or stored in the low 16.

It's confusing because it seem to imply (incorrectly) that the high 16b of a GP register is not part of the destination. Then they're still talking only about 32-bit (and 64-bit) destination registers, not GP destination registers in general:

If the register is a destination operand, the resulting value in the two high-order bytes of the register is implementation dependent. For the Pentium 4, Intel Xeon, and P6 family processors, the two high-order bytes are filled with zeros; for earlier 32-bit IA-32 processors, the two high order bytes are undefined.

This isn't anything the assembler itself has to worry about. At first I thought this was saying something that included the behaviour with a 66 prefix, but it isn't.

With 16-bit operand size, it always leaves the upper bytes of the full destination register unmodified, like normal for 16-bit destinations.

Thus, the usual prefix-selection algorithm applies for gp-register destinations, considering .code16 vs. .code32/.code64 to select 66 or REX.W based on the destination size. (REX.W is always redundant like for movzwl; implicit vs. explicit zero extension from 32 to 64 makes no difference, other than wasting a byte if a REX prefix wasn't otherwise needed.)

It's only with memory source / destinations that 66 is redundant when the source is a segment reg.

Contrary to Intel's manual, REX.W is also redundant for a memory destination. It's still a 16-bit store, even though Intel documents that form of the instruction as REX.W + 8C /r MOV r/m64,Sreg*. It does not* zero-extend to a 64-bit memory destination. (Or 32-bit with no prefix).

Test program in NASM/YASM syntax (64-bit version, but search/replace rsi/esi to convert to 32-bit, since I used the 32-bit ABI for _exit.)

global _start _start: mov rsi, rsp mov eax, 0xdeadbeef mov [rsi], eax mov dword [rsi+4], 0xbadf00d ; firstbreak: mov [rsi], fs ; 8c 26 16-bit store ; mov dword [rsi], fs ; not encodeable ; mov qword [rsi], fs ; 48 8c 24 24 YASM chokes, NASM assembles REX.W 8c 26. Still a 16-bit store!!!

mov    ax, fs       ; 66 8c e0  only modifies AX, leaving upper bits
mov   eax, fs       ;    8c e0  zeros whole rax
mov   rax, fs       ;    8c e0  zeros whole rax   (YASM: 48 8c e0)
mov   r10d, fs      ; 41 8c e2  (rex.w=0)
mov   r10, fs       ; 49 8c e2  (rex.w=1)

xor ebx,ebx
mov eax,1
int 0x80     ; sys_exit(0)   (32-bit ABI)

I feel like using only the least 16 bits of an address might cause problems in 64 bit mode.

Not sure what you're thinking here.

Segment selectors are not addresses. They are always 16-bit values. In 32-bit protected mode, and in 64-bit mode, they are indices into the GDT or LDT. (http://wiki.osdev.org/GDT_Tutorial#Reload_segment_registers). It's only in real mode where the segment register value is directly shifted and added to the rest of the addressing mode to form a linear address. (In protected / long modes, the segment base from the cached descriptor is added, before paging)

The 66 prefix is the operand-size prefix, not the address-size prefix. It changes mov eax, ecx into mov ax, cx, for example.

In 64-bit code, the 0x67 address-size prefix means 32-bit address size. 64-bit mode has no way to encode 16-bit addressing modes like [BX + SI]. http://wiki.osdev.org/X86-64_Instruction_Encoding#Operand-size_and_address-size_override_prefix. There's no way for 64-bit code to do anything with 16-bit addresses.

llvm / llvm-project

MOV16ms uses operand size override prefix #33826

Extended Description