x86 assembler immediates should be limited to 32 bit

DimitryAndric commented 7 years ago


Bugzilla Link	33654
Version	trunk
OS	All
Blocks	llvm/llvm-project#11360
CC	@topperc,@RKSimon,@rotateright

Extended Description

In bug 33386, Jonas Vekeman encountered a situation where OpenSSL's inline assembly in combination with function inlining could sometimes lead to errors from the inline asm parser, e.g.:

<inline asm>:1:12: error: invalid operand for instruction
        divq      $-8446744073709551616
                  ^~~~~~~~~~~~~~~~~~~~~
LLVM ERROR: Error parsing inline asm

For this particular case, I created a test.s file with just:

divq 0x7fffffff
divq 0x80000000
divq 0xffffffff
divq 0x100000000
divq 0xffffffffffffffff

For this assembly, GNU as rightfully complains about lines 2 through 4:

divtest.s: Assembler messages:
divtest.s:2: Error: operand type mismatch for `div'
divtest.s:3: Error: operand type mismatch for `div'
divtest.s:4: Error: operand type mismatch for `div'

However, clang 5.0.0 doesn't complain at all, and produces an object file. Disassembling it shows that clang has silently wrapped the too-large values:

   0:       48 f7 34 25 ff ff ff 7f         divq    2147483647
   8:       48 f7 34 25 00 00 00 80         divq    -2147483648
  10:       48 f7 34 25 ff ff ff ff         divq    -1
  18:       48 f7 34 25 00 00 00 00         divq    0
  20:       48 f7 34 25 ff ff ff ff         divq    -1

At the very least, it should complain here like GNU as does, IMHO.

llvmbot commented 7 years ago

OK, what should we do with this bug? I suggest to close it. Or you know the similar issues with other instructions?

DimitryAndric commented 7 years ago

Dmitry, I have some comments/questions.

Hi Andrew, sorry for forgetting all about this bug. :)

First of all your example implies you're using ATT syntax, right? But ATT requires "$" sign for immediates. Without $ ATT uses numbers as memory references that's why 64 bit is OK.

Hmm, the reason I used this notation is that objdump also outputs it like that, e.g:

$ objdump -dw divtest-gas.o

divtest-gas.o: file format elf64-x86-64-freebsd

Disassembly of section .text:

0000000000000000 <.text>: 0: 48 f7 34 25 ff ff ff 7f divq 0x7fffffff 8: 48 f7 34 25 ff ff ff ff divq 0xffffffffffffffff

Even llvm-objdump does this, although it uses decimal instead of hex:

$ llvm-objdump -d divtest-gas.o

divtest-gas.o: file format ELF64-x86-64

Disassembly of section .text: .text: 0: 48 f7 34 25 ff ff ff 7f divq 2147483647 8: 48 f7 34 25 ff ff ff ff divq -1

I can see the immediate bytes right there in the opcodes.

On the other side if I add $ in your code then I see the error like here:

test.s:2:10: error: invalid operand for instruction divq $0x80000000 ^~~

The problem is in emmitor: parser eats such construction properly but the emmitor can't because instruction selector does not work here (that's exactly what we see in example from Jonas Vekeman).

I don't think we should allow usage imms without $ in ATT syntax. Am I right? If yes it means we should fix the issue with instruction selection, right?

I'm unsure now. For some reason, GNU as refuses all the $-prefixed constants:

$ cat divtest-dollar.s divq $1 divq $2147483647

$ as divtest-dollar.s divtest-dollar.s: Assembler messages: divtest-dollar.s:1: Error: operand type mismatch for div' divtest-dollar.s:2: Error: operand type mismatch fordiv'

I'm no longer certain what the correct syntax is here...

BTW, Intel syntax does not allow ussage of ims at all because DIV has only one operand which is DIV r/m. I think that's the reason of "invalid operand" meassage.

It's an interesting case, because indeed Intel's own SDM says "All arithmetic instructions (except the DIV and IDIV instructions) allow the source operand to be an immediate value".

But those opcodes emitted by GNU as seem to prove otherwise. :)

llvmbot commented 7 years ago

I suppose we should close this bug or at least it should be properly re-written if it's real issue here.

llvmbot commented 7 years ago

Dmitry, I have some comments/questions. First of all your example implies you're using ATT syntax, right? But ATT requires "$" sign for immediates. Without $ ATT uses numbers as memory references that's why 64 bit is OK. On the other side if I add $ in your code then I see the error like here:

test.s:2:10: error: invalid operand for instruction divq $0x80000000 ^~~

The problem is in emmitor: parser eats such construction properly but the emmitor can't because instruction selector does not work here (that's exactly what we see in example from Jonas Vekeman).

I don't think we should allow usage imms without $ in ATT syntax. Am I right? If yes it means we should fix the issue with instruction selection, right?

BTW, Intel syntax does not allow ussage of ims at all because DIV has only one operand which is DIV r/m. I think that's the reason of "invalid operand" meassage.

DimitryAndric commented 7 years ago

Likely related to bug 24447, bug 24448 and bug 24449.

llvm / llvm-project

x86 assembler immediates should be limited to 32 bit #33001

Extended Description